Use Argo CD to manage my home Kubernetes cluster

This post records how I use Argo CD to manage my home Kubernetes cluster.

Before this setup, most Kubernetes resources were applied manually. That is fine when the cluster is small, but after adding application workloads, Redis, Longhorn, Istio, monitoring, and ingress resources, I wanted the cluster to be rebuilt from Git as much as possible.

The goal is simple:

Keep Kubernetes manifests in Git.
Let Argo CD reconcile the cluster.
Keep secrets out of Git.
Make a new RKE cluster recoverable with a small bootstrap checklist.

For the examples in this series, I use the same fake environment:

Git repo: ssh://[email protected]/platform/k8s-infra.git
Cluster API: https://rke-api.example.internal:6443
Vault: https://vault.example.internal:8200
OTEL backend: http://otel.example.internal:4318
Apps: example-api, example-worker, example-admin
Public hosts: api.example.com, worker.example.com

example-admin is kept as an internal app in these examples, so it has secrets but no public Gateway route.

Series

This post is part of my home Kubernetes GitOps series:

GitOps flow

The repository structure is like this:

clusters/root-app.yaml
clusters/apps/example-api.yaml
clusters/apps/redis.yaml
clusters/apps/external-secrets.yaml
clusters/apps/example-admin.yaml
clusters/apps/example-worker.yaml
clusters/apps/monitoring.yaml
clusters/apps/infra.yaml
apps/example-api/
apps/redis/
apps/example-admin/
apps/example-worker/
infra/

The important file is clusters/root-app.yaml.

I only apply this root Application manually. After that, Argo CD reads clusters/apps and creates the child Applications.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: k8s-infra-root
  namespace: argocd
spec:
  project: default
  source:
    repoURL: ssh://[email protected]/platform/k8s-infra.git
    targetRevision: main
    path: clusters/apps
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

This is the app-of-apps pattern. Argo CD does not only deploy one app; it deploys the Applications that deploy the real workloads.

Install Argo CD first

For a new cluster, Argo CD is still a manual step.

kubectl create namespace argocd
kubectl apply -n argocd --server-side --force-conflicts \
  -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

The stable manifest is convenient for a lab. For a reproducible production bootstrap, I would pin a specific Argo CD release manifest instead.

Because my Git repository is private, Argo CD also needs SSH access to the repo. The values below are examples. Replace the host, repository path, and SSH key path with your own environment. If your Git server uses the normal SSH port, no custom port is needed.

ssh-keyscan git.example.com > /tmp/argocd_known_hosts
ssh-keygen -lf /tmp/argocd_known_hosts

kubectl -n argocd create configmap argocd-ssh-known-hosts-cm \
  --from-file=ssh_known_hosts=/tmp/argocd_known_hosts \
  --dry-run=client -o yaml | kubectl apply -f -

kubectl -n argocd create secret generic k8s-infra-repo \
  --from-literal=type=git \
  --from-literal=url=ssh://[email protected]/platform/k8s-infra.git \
  --from-file=sshPrivateKey=/home/user/.ssh/k8s_infra \
  --dry-run=client -o yaml | kubectl apply -f -

kubectl -n argocd label secret k8s-infra-repo \
  argocd.argoproj.io/secret-type=repository --overwrite

I check the host key fingerprint before applying it. ssh-keyscan is useful, but by itself it only collects the key; it does not prove the key is the right one.

After Argo CD can read Git, I apply the root Application.

kubectl apply -f clusters/root-app.yaml

I use automated sync with prune and selfHeal after I trust the repository path. On a first migration, I would check the diff carefully before allowing Argo CD to prune resources.

Sync order

One problem with GitOps is that not every resource can be applied at the same time. CRDs must exist before custom resources. Secret controllers must exist before generated secrets. Some applications should wait until shared infra is ready.

So I use Argo CD sync waves.

wave -40: Gateway API CRDs
wave -30: Istio base
wave -20: Istio control plane
wave -10: Istio CNI and External Secrets Operator
wave -5: Istio ambient ztunnel
wave 0: Redis and Longhorn
wave 5: application ExternalSecrets
wave 10: application workloads and monitoring
wave 15: Istio PodMonitors and Kiali
wave 18: example admin
wave 20: example worker
wave 30: example API
wave 40: shared infra and ingresses

For example, an API application waits until later:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: example-api
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "30"
spec:
  project: default
  source:
    repoURL: ssh://[email protected]/platform/k8s-infra.git
    targetRevision: main
    path: apps/example-api
  destination:
    server: https://kubernetes.default.svc
    namespace: example-api
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - SkipDryRunOnMissingResource=true

Sync waves are not a perfect health gate. They control apply order, but they do not guarantee every controller-generated object is ready before the next app starts. For example, a workload that depends on an ExternalSecret may fail once, then become healthy after the secret controller creates the Kubernetes Secret.

That is acceptable for my home lab, but it is something I need to remember when debugging first sync.

What Argo CD manages

In my current cluster, Argo CD manages these child Applications:

gateway-api-crds
istio-base
istiod
istio-cni
ztunnel
external-secrets
redis
longhorn
example-secrets
monitoring
istio-observability
kiali
example-admin
example-worker
example-api
infra

The result is that most of the cluster can be recreated from Git. The parts that are still manual are the bootstrap dependencies:

RKE cluster is created.
Argo CD is installed.
Argo CD has the private repo SSH credential.
Vault Kubernetes auth is configured.
Required secret values already exist in Vault.

After those are ready, the root Application can take over.

Validate the sync

First check Argo CD Applications.

kubectl -n argocd get applications

Then check important controllers and workloads.

kubectl -n external-secrets get pods
kubectl get clustersecretstore

kubectl -n longhorn-system get pods
kubectl -n istio-system get pods
kubectl -n redis get pods
kubectl -n example-admin get pods
kubectl -n example-worker get pods
kubectl -n example-api get pods

For Istio ambient mode, I also check CNI, ztunnel, and waypoint resources.

kubectl -n istio-system get daemonset istio-cni-node ztunnel
kubectl -n example-api get gateway example-api-waypoint
kubectl -n example-api get deploy,svc | grep waypoint

If the ingress is ready, I can test the public HTTPS route.

curl https://api.example.com/health
curl https://worker.example.com/

Common problems

If Argo CD cannot clone the repository, I check the repository Secret and known hosts ConfigMap first.

kubectl -n argocd get secret k8s-infra-repo
kubectl -n argocd get configmap argocd-ssh-known-hosts-cm

If child Applications exist but stay unhealthy, I check whether the earlier sync waves are ready. A missing CRD or failed controller can make later apps look broken even when their YAML is correct.

If an application depends on a generated Secret, I check External Secrets before checking the app logs.

kubectl get clustersecretstore
kubectl -n example-api get secret example-api-env-file
kubectl -n example-worker get secret example-worker-config-file
kubectl -n example-admin get secret example-admin-config-file

Conclusion

This setup keeps my Kubernetes cluster closer to a rebuildable system. Git stores the desired state, Argo CD keeps the cluster aligned, and Vault keeps the sensitive values outside of Git.

For a home cluster, this is a good balance. It is not fully automatic from an empty machine, but after the manual bootstrap pieces are ready, the cluster can recover in a much more predictable way.