Bootstrap a new RKE cluster for GitOps

This post records how I prepare a new RKE Kubernetes cluster before letting Argo CD reconcile the applications.

In my setup, Argo CD can manage most of the cluster after the root Application is applied. But a new cluster still needs a few manual pieces first. If those pieces are missing, the first sync looks noisy: CRDs are missing, ExternalSecrets cannot read Vault, workloads start before their Secrets exist, or PVC users fail because storage is not ready.

So I treat bootstrap as a checklist, not just one kubectl apply.

This post uses the same fake environment as the other examples:

Git repo: ssh://[email protected]/platform/k8s-infra.git
Cluster API: https://rke-api.example.internal:6443
Vault: https://vault.example.internal:8200
OTEL backend: http://otel.example.internal:4318
Apps: example-api, example-worker, example-admin
Public hosts: api.example.com, worker.example.com

example-admin is an internal app in this example set. It needs secrets, but it does not get a public Gateway route.

Series

This post is part of my home Kubernetes GitOps series:

What Argo CD manages later

After the root Application is applied, Argo CD creates child Applications from clusters/apps.

clusters/root-app.yaml
clusters/apps/gateway-api-crds.yaml
clusters/apps/istio-base.yaml
clusters/apps/istiod.yaml
clusters/apps/istio-cni.yaml
clusters/apps/ztunnel.yaml
clusters/apps/external-secrets.yaml
clusters/apps/redis.yaml
clusters/apps/longhorn.yaml
clusters/apps/example-secrets.yaml
clusters/apps/example-admin.yaml
clusters/apps/example-worker.yaml
clusters/apps/example-api.yaml
clusters/apps/infra.yaml

The sync order matters:

wave -40: Gateway API CRDs
wave -30: Istio base
wave -20: Istio control plane
wave -10: Istio CNI and External Secrets Operator
wave -5: Istio ambient ztunnel
wave 0: Redis and Longhorn
wave 5: application ExternalSecrets
wave 10: application workloads and monitoring
wave 15: Istio observability and Kiali
wave 18: admin services
wave 20: worker services
wave 30: API services
wave 40: shared infra and ingresses

Sync waves control apply order. They do not prove that every controller-generated object is ready before the next Application starts. That is why the manual prerequisites are still important.

Manual prerequisites

Before applying the root Application, I check these first:

The RKE cluster is reachable with kubectl.
Argo CD is installed in the argocd namespace.
Argo CD can read the private Git repository.
Vault is reachable from Kubernetes pods.
Vault Kubernetes auth is configured for this cluster.
Vault policies and roles exist for External Secrets Operator.
Required secret values already exist in Vault KV v2.
Longhorn node and storage prerequisites are ready.
Istio ambient prerequisites are ready on the nodes.
An ingress path exists if public routes should serve traffic.

This is the part that keeps the first sync boring. Boring is good here.

Install Argo CD

Argo CD itself is the first manual install.

kubectl create namespace argocd
kubectl apply -n argocd --server-side --force-conflicts \
  -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

For a lab, the stable manifest is convenient. For a reproducible production bootstrap, I would pin a specific Argo CD release manifest.

Register the Git repository

The repository credential is sensitive, so I create it directly in the cluster. These values are examples.

ssh-keyscan git.example.com > /tmp/argocd_known_hosts
ssh-keygen -lf /tmp/argocd_known_hosts

kubectl -n argocd create configmap argocd-ssh-known-hosts-cm \
  --from-file=ssh_known_hosts=/tmp/argocd_known_hosts \
  --dry-run=client -o yaml | kubectl apply -f -

kubectl -n argocd create secret generic k8s-infra-repo \
  --from-literal=type=git \
  --from-literal=url=ssh://[email protected]/platform/k8s-infra.git \
  --from-file=sshPrivateKey=/home/user/.ssh/k8s_infra \
  --dry-run=client -o yaml | kubectl apply -f -

kubectl -n argocd label secret k8s-infra-repo \
  argocd.argoproj.io/secret-type=repository --overwrite

I verify the host key fingerprint before applying it. ssh-keyscan only fetches the key; it does not prove the key is correct.

Prepare Vault auth

External Secrets Operator authenticates to Vault with its Kubernetes service account. Vault needs to trust this new cluster for token review.

kubectl create namespace vault-auth --dry-run=client -o yaml | kubectl apply -f -
kubectl -n vault-auth create serviceaccount vault-auth
kubectl create clusterrolebinding vault-auth-tokenreview \
  --clusterrole=system:auth-delegator \
  --serviceaccount=vault-auth:vault-auth

Create a service account token Secret:

apiVersion: v1
kind: Secret
metadata:
  name: vault-auth-token
  namespace: vault-auth
  annotations:
    kubernetes.io/service-account.name: vault-auth
type: kubernetes.io/service-account-token

Export the reviewer token and cluster CA:

kubectl apply -f vault-auth-token.yaml
TOKEN_REVIEWER_JWT=$(kubectl -n vault-auth get secret vault-auth-token -o jsonpath='{.data.token}' | base64 -d)
kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' | base64 -d > ca.crt

Then configure Vault:

export VAULT_ADDR=https://vault.example.internal:8200
vault login

vault auth enable kubernetes

vault write auth/kubernetes/config \
  kubernetes_host="https://rke-api.example.internal:6443" \
  kubernetes_ca_cert=@ca.crt \
  token_reviewer_jwt="$TOKEN_REVIEWER_JWT"

If the Kubernetes auth mount already exists, I keep it and update only auth/kubernetes/config with the new cluster API, CA, and reviewer token.

I do not disable issuer validation by default. If issuer validation fails, I want to understand the issuer mismatch first.

Seed Vault data

Git should contain only the mapping. Vault should contain the real values.

Example paths:

vault kv put secret/example-api/env-file dotenv=@.env
vault kv put secret/example-worker/config-file config.json=@config.json
vault kv put secret/example-admin/config-file config=@config.yml

The matching Vault policies should use the KV v2 API path:

path "secret/data/example-api/*" {
  capabilities = ["read"]
}

Start GitOps

After the manual pieces are ready, I apply the root Application:

kubectl apply -f clusters/root-app.yaml

Then I watch Argo CD create the child Applications.

kubectl -n argocd get applications

Validate the first sync

I start with controllers and generated resources before debugging application logs.

kubectl -n external-secrets get pods
kubectl get clustersecretstore
kubectl -n example-api get secret example-api-env-file
kubectl -n example-worker get secret example-worker-config-file
kubectl -n example-admin get secret example-admin-config-file

Then I check the platform pieces.

kubectl -n longhorn-system get pods
kubectl -n istio-system get pods
kubectl -n istio-system get daemonset istio-cni-node ztunnel
kubectl -n monitoring get podmonitor

Finally, I check workloads.

kubectl -n redis get pods
kubectl -n example-admin get pods
kubectl -n example-worker get pods
kubectl -n example-api get pods

Common failures

If ClusterSecretStore is not ready, I check Vault Kubernetes auth first:

kubectl describe clustersecretstore vault-example-api
kubectl -n external-secrets logs deploy/external-secrets --tail=120

If ExternalSecret cannot read a Vault path, I check:

The Vault role name.
The service account binding.
The KV v2 policy path.
The remote key and property name.

If Redis or other PVC users fail, I check Longhorn before the app. A storage problem often appears as an application problem first.

If Istio ambient resources are applied but pods are not enrolled, I restart the workloads after the CNI and ztunnel are healthy.

Conclusion

The important idea is that GitOps does not remove bootstrap. It moves most cluster state into Git, but a new cluster still needs identity, secrets, storage, and platform prerequisites before reconciliation is useful.

After those pieces exist, Argo CD can do what it is good at: keep the cluster aligned with Git.