Table of Contents
Kubespray
Production Kubernetes Cluster: System Configuration Table
Node Role | Quantity | vCPU | RAM | Storage | Purpose |
---|---|---|---|---|---|
Control Plane (master) | 3 (odd number) | 2–4 vCPU | 4–8 GB | 50–100 GB (SSD) | etcd, kube-apiserver, scheduler, controller |
Worker Nodes | 3–10+ | 4–16 vCPU | 8–64 GB | 100–500 GB (SSD) | Runs actual pods, ingress, apps, services |
Load Balancer (HA) | 1–2 | 1–2 vCPU | 2 GB | 20 GB | HAProxy/Nginx or cloud LB for API endpoint |
Storage Node (if local) | Optional | 4 vCPU | 8+ GB | 500GB–2TB+ (SSD/NVMe) | For Ceph, Rook, Longhorn, etc. |
Bastion / Jumpbox | Optional | 1 vCPU | 1–2 GB | 10 GB | Secure SSH access to cluster |
Logging / Monitoring | Optional | 2–4 vCPU | 4–8 GB | 100–500 GB (SSD) | Prometheus, Grafana, Loki, Fluent Bit, etc. |
Delivering Keys To Remote Hosts
Define username and IP address pairs in deliver-keys-ips.list
file:
username@172.16.1.10
username@172.16.1.11
...
Run this script to deliver a key to multiple remote hosts:
#!/bin/bash
# Config
KEY_PATH="$HOME/.ssh/<KEY-NAME>.pub" # Change if using a custom key
PORT=22 # Change if using a custom SSH port
IP_LIST="deliver-keys-ips.list" # One IP per line in this file
# Check key exists
if [ ! -f "$KEY_PATH" ]; then
echo "SSH key not found at $KEY_PATH"
exit 1
fi
# Loop through IP list
while IFS= read -r TARGET || [[ -n "$TARGET" ]]; do
[[ -z "$TARGET" || "$TARGET" =~ ^# ]] && continue # Skip empty lines or comments
echo "Installing key on $TARGET..."
ssh-copy-id -i "$KEY_PATH" -p "$PORT" "$TARGET"
if [ $? -eq 0 ]; then
echo "Success: $TARGET"
else
echo "Failed: $TARGET"
fi
done < "$IP_LIST"
You need sudo
on remote hosts to run ansible playbooks with become
enabled. And if you want to copy ssh keys to the root directory.
For remote hosts where root password login is temporary allowed:
ssh root@<remote-ip> "apt update && apt install -y sudo && usermod -aG sudo username"
ssh root@<remote-ip> "echo 'username ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/username && chmod 440 /etc/sudoers.d/username"
Manual installation from VM console:
apt install -y sudo
usermod -aG sudo username
echo 'username ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/username && chmod 440 /etc/sudoers.d/username
Now you can copy ssh keys to the root directoy:
#!/bin/bash
HOSTS_FILE="deliver-keys-ips.list" # Contains one username@host pair per line
for TARGET in $(cat "$HOSTS_FILE"); do
echo "Copying SSH key to root..."
ssh "$TARGET" 'sudo mkdir -p /root/.ssh &&
sudo cp ~/.ssh/authorized_keys /root/.ssh/ &&
sudo chown root:root /root/.ssh/authorized_keys &&
sudo chmod 600 /root/.ssh/authorized_keys' && \
echo "Done on $TARGET" || echo "Failed on $TARGET"
done
Getting Started
Copy sample inventory ./inventory/sample
to ./inventory/mycluster
. Declare the inventory in ./inventory/mycluster/inventory.ini
:
[kube_control_plane]
node1 ansible_host=10.10.9.110 # ip=10.10.9.110 etcd_member_name=etcd1
# node2 ansible_host=95.54.0.13 # ip=10.3.0.2 etcd_member_name=etcd2
# node3 ansible_host=95.54.0.14 # ip=10.3.0.3 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
node4 ansible_host=10.10.9.120 # ip=10.10.9.120
# node5 ansible_host=95.54.0.16 # ip=10.3.0.5
# node6 ansible_host=95.54.0.17 # ip=10.3.0.6
ansible_user: username
[k8s_cluster:children]
kube_control_plane
kube_node
If your localhost has outdated ansible and you don't want to renew - you can run playbook in Docker container.
docker run --rm -it \
--mount type=bind,source="$(pwd)"/inventory,dst=/inventory \
--mount type=bind,source="${HOME}"/.ssh/<KEY-NAME>,dst=/root/.ssh/id_rsa \
quay.io/kubespray/kubespray:v2.28.0 bash
The KEY-NAME
is the key you're using to provision remote hosts. Kubespray will use this key to run ansible tasks.
If you're provisioning remote hosts after VPN - add --network=host
to allow docker container use your host network.
You can save final run-ansible.sh
bash script to run kubespray docker container in your kubespray ansible directory:
#!/bin/bash
set -e
docker run \
--rm -it \
--network=host \
--mount type=bind,source="$(pwd)"/inventory/,dst=/inventory/ \
--mount type=bind,source="$(pwd)",dst=/my-kubespray \
--mount type=bind,source="${HOME}"/.ssh/<KEY-NAME>,dst=/root/.ssh/id_rsa \
quay.io/kubespray/kubespray:v2.28.0 bash
Execute script and run the playbook (in Docker container):
ansible-playbook -i /inventory/mycluster/inventory.ini \
./cluster.yml \
-b -v \
--private-key=~/.ssh/id_rsa -CD
MITM SSL/TLS Certificates
Sometimes when you deploy Kubernetes in private on-premise DC which is intercepting HTTPS you need to trust MITM SSL/TLS certificates to download Kubernetes components from Github.
openssl s_client -showcerts -connect github.com:443 </dev/null
You will get output like:
-----BEGIN CERTIFICATE-----
MIIDyzCCArOgAwIBAgIBAjANBgkqhkiG9...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIFazCCA1OgAwIBAgISA...
-----END CERTIFICATE-----
...
Copy certificate blocks to mitm.crt
:
openssl x509 -in mitm.crt -noout -subject -issuer
sudo cp mitm.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
The
get_url
does not use requests, soREQUESTS_CA_BUNDLE
is ignored. It usesurllib
+ssl
, andSSL_CERT_FILE
is only used if it's exported in the shell.
Also playbooks can have tasks which use Python modules. These modules will use not the system-wide path used by curl/apt on Debian:
python3 -c "import ssl; print(ssl.get_default_verify_paths())"
DefaultVerifyPaths(cafile='/usr/lib/ssl/cert.pem', capath='/usr/lib/ssl/certs', openssl_cafile_env='SSL_CERT_FILE', openssl_cafile='/usr/lib/ssl/cert.pem', openssl_capath_env='SSL_CERT_DIR', openssl_capath='/usr/lib/ssl/certs')
Configure global host /etc/environment
values:
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
export PIP_CERT=/etc/ssl/certs/ca-certificates.crt
Verify:
python3 -c "import ssl; print(ssl.get_default_verify_paths())"
DefaultVerifyPaths(cafile='/etc/ssl/certs/ca-certificates.crt', capath='/etc/ssl/certs', openssl_cafile_env='SSL_CERT_FILE', openssl_cafile='/usr/lib/ssl/cert.pem', openssl_capath_env='SSL_CERT_DIR', openssl_capath='/usr/lib/ssl/certs')
Also you can temporary disable SSL/TLS certs validation in ./inventory/mycluster/group_vars/all/all.yml
(global) or ./roles/kubespray_defaults/defaults/main/download.yml
(for curl
operations):
download_validate_certs: false
To skip SSL/TLS validation for containerd
registries open ./inventory/mycluster/group_vars/all/containerd.yml
and add:
containerd_registries_mirrors:
- prefix: docker.io
mirrors:
- host: https://registry-1.docker.io
capabilities: ["pull", "resolve"]
skip_verify: true
- prefix: quay.io
mirrors:
- host: https://quay.io
capabilities: ["pull", "resolve"]
skip_verify: true
- prefix: registry.k8s.io
mirrors:
- host: https://registry.k8s.io
capabilities: ["pull", "resolve"]
skip_verify: true
Do not forget to set this option back to true
when finish debugging.
You should get the certificate chain using openssl s_client
:
openssl s_client -connect registry-1.docker.io:443 -showcerts
openssl s_client -connect quay.io:443 -showcerts
openssl s_client -connect gcr.io:443 -showcerts
If your datacenter intercepts this request, you will see your MITM proxy’s root or intermediate cert in the output.
Save the root or intermediate certificate. Look for the MITM certificate block -----BEGIN CERTIFICATE-----
.
CONNECTED(00000003)
depth=2 O = MITM Proxy Inc., CN = MITM Root CA
...
-----BEGIN CERTIFICATE-----
MIIDYTCCAkmgAwIBAgIJAMDoa3LVz7yfMA0GCSqGSIb3DQEBCwUAMEUxCzAJBgNV
...
-----END CERTIFICATE-----
Copy the full PEM block (from BEGIN to END) and save it to a file, e.g.:
vim mitm-root.crt
openssl x509 -in mitm-root.crt -text -noout
Issuer: O = MITM Proxy Inc., CN = MITM Root CA
Subject: O = MITM Proxy Inc., CN = MITM Root CA
Add to your system trust store:
sudo cp mitm-root.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
sudo systemctl restart containerd
Admin Config File
When Kubespray deployment is finished you can get admin.conf
from any control plane or worker node:
sudo cat /etc/kubernetes/admin.conf
Copy this configuration file from remote host to localhost. Edit configuration file and replace 127.0.0.1
with any control plane remote host IP address:
server: https://127.0.0.1:6443 # Replace 127.0.0.1 with control plane IP address
Now you can use it with kubectl
:
kubectl --kubeconfig /path/to/your/admin.conf get nodes
Checklist
Reboot a Kubernetes Node Safely
Important Notes:
- Do not drain control plane nodes unless you know what you're doing.
- Some stateful pods (e.g., databases with emptyDir) may not tolerate eviction — check carefully.
- For production environments, use a rolling drain & reboot strategy with one node at a time.
List all nodes:
kubectl --kubeconfig <...> get nodes
NAME STATUS ROLES AGE VERSION
dev-kubernetes-cp0 Ready control-plane 36h v1.32.5
dev-kubernetes-cp1 Ready control-plane 36h v1.32.5
dev-kubernetes-cp2 Ready control-plane 36h v1.32.5
dev-kubernetes-wn0 Ready <none> 36h v1.32.5
dev-kubernetes-wn1 Ready <none> 36h v1.32.5
dev-kubernetes-wn2 Ready <none> 36h v1.32.5
dev-kubernetes-wn3 Ready <none> 36h v1.32.5
Drain the node:
kubectl --kubeconfig <...> drain <node-name> --ignore-daemonsets --delete-emptydir-data
--ignore-daemonsets
keeps system-level pods like CNI, logging, etc.--delete-emptydir-data
warns you if pods use emptyDir (ephemeral).
Optional: Add --force
if needed (e.g., un-managed pods).
Reboot the node, add CPU cores or MEM and power up it again.
Uncordon the node:
kubectl --kubeconfig <...> uncordon <node-name>
Cluster Can Run Pods
Run a simple pod in default namespace with debian
:
root@kubespray-container:/kubespray# kubectl --kubeconfig /inventory/mycluster/admin.conf \
run debian-slim \
--image=debian:bullseye-slim \
--restart=Never \
-- sleep infinity
pod/debian-slim created
Check debian pod is running:
root@kubespray-container:/kubespray# kubectl \
--kubeconfig /inventory/mycluster/admin.conf \
get pods
NAME READY STATUS RESTARTS AGE
debian-slim 1/1 Running 0 57s
Describe pod:
root@kubespray-container:/kubespray# kubectl \
--kubeconfig /inventory/mycluster/admin.conf \
describe pod debian-slim
...
Containers:
debian-slim:
Container ID: containerd://5ded5dd00bc8b36864e9846805b7e2b9a9f66db51b3a90ad1108fdd688d38ad5
Image: debian:bullseye-slim
Image ID: docker.io/library/debian@sha256:b5f9bc44bdfbd9d551dfdd432607cbc6bb5d9d6dea726a1191797d7749166973
Running pod (container) using image from external docker registry means that the kubernetes cluster has networking and Internet access.
Execute remote bash
shell:
root@kubespray-container:/kubespray# kubectl \
--kubeconfig /inventory/mycluster/admin.conf \
exec -it debian-slim -- bash
Install applications needed for different checks:
root@debian-slim:/# apt update
root@debian-slim:/# apt install curl iputils-ping telnet netcat
If you are running default
debian-slim
image and can update/install its packages - your pod already has Internet access and DNS resolution.
The next sections are important for isolated environments only. For example - if you have to run containers using custom images from private docker/podman registry and/or using private apt
mirror located in your local DC network.
Internet Access From Pods
root@debian-slim:/# ping 8.8.8.8
root@debian-slim:/# curl -Lv google.com
Cross-Namespace Service Access
Using netcat
:
root@debian-slim:/# nc -zv coredns.kube-system.svc.cluster.local 9153
Connection to coredns.kube-system.svc.cluster.local (10.233.0.3) 9153 port [tcp/*] succeeded!
Using telnet
:
root@debian-slim:/# telnet coredns.kube-system.svc.cluster.local 9153
Trying 10.233.0.3...
Connected to coredns.kube-system.svc.cluster.local.
Escape character is '^]'.
^]
telnet> Connection closed.
Ingress NGINX Controller
Adjusting Timeouts
You can adjust proxy timeouts and many other parameters using ingress annotations in chart values files:
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
nginx.ingress.kubernetes.io/proxy-send-timeout: "120"
nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
nginx.ingress.kubernetes.io/keep-alive-timeout: "150"
nginx.ingress.kubernetes.io/upstream-keepalive-timeout: "240"
...
Kubectl
Node Labels
To make Gitlab runner spawn workers only on selected node.
kubectl label node <NODE_NAME> gitlab-runner=dedicated
kubectl get node <NODE_NAME> --show-labels
Add node_selector
in the runner configuration via Helm values:
runners:
config: |
[[runners]]
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
image = "debian:bookworm"
pull_policy = ["if-not-present"]
cap_add = ["NET_ADMIN"]
[runners.kubernetes.node_selector]
gitlab-runner = "dedicated"
Or using --set
parameter:
helm upgrade --install gitlab-runner gitlab/gitlab-runner \
--set runners.kubernetes.nodeSelector."gitlab-runner"=dedicated
Show all pods on a specific node:
kubectl get pods -A --field-selector spec.nodeName=<NODE_NAME>
kubectl get pods -A -o wide --field-selector spec.nodeName=<NODE_NAME>
Label all other nodes as general:
kubectl label node <APP_NODE_NAME> role=general
To prevent application deploys on a gitlab dedicated node use node selector enable node selector in spec:
nodeSelector:
role: general
You can taint a kubernetes node to prevent other applications deployment:
kubectl taint node <NODE_NAME> dedicated=gitlab:NoSchedule
If you have custom node pool with labeled nodes then your application deployment (gitlab-runner) configuration must have tolerations in spec to deploy on tainted nodes:
spec:
replicas: 1
selector:
matchLabels:
app: <...>
template:
metadata:
labels:
app: <...>
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gitlab"
effect: "NoSchedule"
Add tolerations
in the runner configuration via Helm values:
runners:
config: |
[[runners]]
[runners.kubernetes]
...
[runners.kubernetes.node_selector]
...
[runners.kubernetes.node_tolerations]
"dedicated=gitlab" = "NoSchedule"
Test what IP is used from inside the cluster
Using wget
:
kubectl run ip-check --rm -it --image=busybox -- /bin/sh
# Then inside the pod:
wget -qO- https://api.ipify.org
Using curl
:
kubectl run curl-test --rm -it --image=curlimages/curl --restart=Never -- sh
# Then inside the pod:
curl https://ifconfig.me
Diagnostics
$ kubectl -n <namespace> logs -f -l app.kubernetes.io/name=<application_name> --max-log-requests=10
Containers
Container information using JSON and jq
:
$ kubectl get deployment <deployment-name> -n <namespace> -o jsonpath="{.spec.template.spec.containers[*].name}"
$ kubectl get deployment <deployment-name> -n <namespace> -o json | jq '.spec.template.spec.containers[] | {name: .name, command: .command, args: .args}'
If your container has no ps
installed - try this:
# cat /proc/1/cmdline | tr '\0' ' '