This post records how I run Apache Airflow on Kubernetes with Argo CD.
Airflow is a good test for a GitOps cluster because it has both normal Helm values and a lot of sensitive runtime values: fernet key, webserver secret key, metadata database connection, broker URL, Redis password, the default admin user, and sometimes a private DAG repository SSH key.
My rule is:
- non-secret Helm values live in Git
- secret values live in Vault
- External Secrets Operator generates the Kubernetes Secrets
- Argo CD deploys Airflow after those Secrets exist
Airflow is a platform service in this example series. It runs alongside the same example app set used in earlier posts, but it has its own namespace and Vault path.
Series
This post is part of my home Kubernetes GitOps series:
- Bootstrap a new RKE cluster for GitOps
- Use Argo CD to manage my home Kubernetes cluster
- Use Vault and External Secrets in Kubernetes
- Run Istio ambient mode with waypoint proxies
- Expose Kubernetes services with Istio Gateway API
- Build an OpenTelemetry stack for Kubernetes apps
- Run Airflow on Kubernetes with GitOps-managed values
- Use Mozilla SOPS with GitOps for encrypted Kubernetes Secrets
Application order
I split Airflow into two Argo CD Applications:
airflow-secrets: sync wave5airflow: sync wave10
The secret Application creates the ClusterSecretStore and ExternalSecret
objects. The Helm Application installs Airflow after that.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: airflow-secrets
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "5"
spec:
project: default
source:
repoURL: ssh://[email protected]/platform/k8s-infra.git
targetRevision: main
path: apps/airflow
destination:
server: https://kubernetes.default.svc
namespace: airflow
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- SkipDryRunOnMissingResource=true
The Airflow chart then uses a multi-source Application. One source is the upstream Helm chart, and the other source is my Git repository for values.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: airflow
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "10"
spec:
project: default
sources:
- repoURL: https://airflow.apache.org
chart: airflow
targetRevision: 1.21.0
helm:
releaseName: airflow
valueFiles:
- $values/apps/airflow/airflow-values.yaml
- repoURL: ssh://[email protected]/platform/k8s-infra.git
targetRevision: main
ref: values
destination:
server: https://kubernetes.default.svc
namespace: airflow
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
limit: 5
backoff:
duration: 30s
factor: 2
maxDuration: 5m
syncOptions:
- CreateNamespace=true
I pin the chart version. I do not want Airflow chart upgrades to happen just because Argo CD refreshed the Application.
Non-secret values in Git
The Git-tracked values file contains runtime shape, persistence, and references to Kubernetes Secret names.
executor: CeleryExecutor
env:
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: "FALSE"
- name: AIRFLOW__WEBSERVER__DEFAULT_UI_TIMEZONE
value: Asia/Taipei
data:
metadataSecretName: airflow-metadata
brokerUrlSecretName: airflow-broker-url
fernetKeySecretName: airflow-fernet-key
apiSecretKeySecretName: airflow-api-secret-key
jwtSecretName: airflow-jwt-secret
webserverSecretKeySecretName: airflow-webserver-secret-key
redis:
passwordSecretName: airflow-redis-password
dags:
gitSync:
enabled: true
repo: ssh://[email protected]/platform/airflow-dags.git
branch: main
subPath: ""
sshKeySecret: airflow-ssh-secret
knownHosts: |
git.example.com ssh-ed25519 <verified-public-host-key>
logs:
persistence:
size: 10Gi
triggerer:
persistence:
size: 5Gi
workers:
celery:
persistence:
size: 10Gi
The knownHosts value above is only a placeholder. In a real setup, I verify
the Git server host key fingerprint before committing the public host key.
I avoid putting the DAG repo behind an example custom SSH port here. If the Git server uses normal SSH, the repo URL stays clean. If a custom port is needed, that should be a Git server decision, not an Airflow chart default.
Secret values in Vault
The secret payload can be one YAML document in Vault. For example:
createUserJob:
defaultUser:
username: admin
password: replace-with-a-generated-password
email: [email protected]
firstName: platform
lastName: admin
fernetKey: replace-with-fernet-key
apiSecretKey: replace-with-api-secret-key
jwtSecret: replace-with-jwt-secret
metadataConnection: postgresql://airflow:[email protected]:5432/airflow?sslmode=disable
brokerUrl: redis://:replace-me@airflow-redis:6379/0
redisPassword: replace-with-redis-password
webserverSecretKey: replace-with-webserver-secret-key
extraSecrets:
airflow-ssh-secret:
data:
gitSshKey: |
-----BEGIN OPENSSH PRIVATE KEY-----
<private-key-from-vault>
-----END OPENSSH PRIVATE KEY-----
Then I write it to Vault:
vault kv put secret/airflow/config config=@airflow-values-secret.yml
The example values are intentionally fake. In a real cluster, I generate those keys and passwords once, store them in Vault, and keep them stable across chart upgrades. Rotating them is a planned operation, not something I let Helm do accidentally.
External Secrets
The ClusterSecretStore points to Vault over HTTPS with the internal CA trusted
by External Secrets Operator.
apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
name: vault-airflow
spec:
provider:
vault:
server: "https://vault.example.internal:8200"
path: "secret"
version: "v2"
caProvider:
type: ConfigMap
name: vault-ca
namespace: external-secrets
key: ca.crt
auth:
kubernetes:
mountPath: kubernetes
role: external-secrets-airflow
serviceAccountRef:
name: external-secrets
namespace: external-secrets
I avoid plain HTTP for Vault. The Airflow payload contains high-value secrets, so the Vault connection should be HTTPS even on an internal network.
Each ExternalSecret reads the same Vault payload and templates one Kubernetes
Secret from it.
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: airflow-fernet-key
namespace: airflow
spec:
refreshInterval: 2m
secretStoreRef:
name: vault-airflow
kind: ClusterSecretStore
target:
name: airflow-fernet-key
creationPolicy: Owner
template:
engineVersion: v2
data:
fernet-key: '{{ (fromYaml .config).fernetKey }}'
data:
- secretKey: config
remoteRef:
key: airflow/config
property: config
I use the same pattern for airflow-api-secret-key, airflow-jwt-secret,
airflow-metadata, airflow-broker-url, airflow-redis-password,
airflow-webserver-secret-key, airflow-default-user, and
airflow-ssh-secret.
Vault policy
The Airflow role only needs read access to the Airflow path.
vault policy write airflow-secret-read - <<'EOF'
path "secret/data/airflow/*" {
capabilities = ["read"]
}
EOF
vault write auth/kubernetes/role/external-secrets-airflow \
bound_service_account_names=external-secrets \
bound_service_account_namespaces=external-secrets \
audience=https://kubernetes.default.svc.cluster.local \
policies=airflow-secret-read \
ttl=1h
This keeps Airflow’s secret access separate from example-api,
example-worker, and example-admin.
Default user job
I also let the chart create or reset the default user from a generated Secret.
The Application values can inject environment variables from
airflow-default-user.
createUserJob:
enabled: true
useHelmHooks: false
applyCustomEnv: false
args:
- bash
- -ec
- |
airflow db check-migrations --migration-wait-timeout=300
airflow users create \
--username "${AIRFLOW_DEFAULT_USERNAME}" \
--firstname "${AIRFLOW_DEFAULT_FIRST_NAME}" \
--lastname "${AIRFLOW_DEFAULT_LAST_NAME}" \
--role Admin \
--email "${AIRFLOW_DEFAULT_EMAIL}" \
--password "${AIRFLOW_DEFAULT_PASSWORD}" \
|| true
airflow users reset-password \
--username "${AIRFLOW_DEFAULT_USERNAME}" \
--password "${AIRFLOW_DEFAULT_PASSWORD}"
The important part is useHelmHooks: false. I want Argo CD to see and manage
the job instead of Helm hiding it behind hook behavior.
Validate
I check generated Secrets first.
kubectl get clustersecretstore vault-airflow
kubectl -n airflow get externalsecret
kubectl -n airflow get secret airflow-fernet-key airflow-metadata airflow-ssh-secret
Then I check the chart resources.
kubectl -n airflow get pods
kubectl -n airflow get jobs
kubectl -n airflow logs job/airflow-create-user --tail=120
If pods start before the Secrets exist, I sync airflow-secrets first and then
refresh the Airflow Application.
Common problems
If Airflow keeps generating new keys, I check whether the chart is creating secrets instead of reading the existing Secret names.
If the DAG sync container cannot clone, I check the SSH private key Secret and
the verified knownHosts value. I do not disable host key checking just to make
Git sync green.
If External Secrets is ready but the Kubernetes Secret is missing, I check the
Vault policy path. KV v2 policies need secret/data/airflow/*.
If the default user job fails, I check whether migrations finished before the user command ran. A retry policy helps, but it should not hide a broken database connection string.
Conclusion
Airflow fits GitOps well as long as I keep a hard line between values and secrets. Git describes the chart and Secret names. Vault stores the values. External Secrets turns those values into Kubernetes Secrets. Argo CD applies the chart after that.
That split makes the Airflow install rebuildable without putting the most sensitive pieces into the repository.