This post records how I built an OpenTelemetry stack for Kubernetes apps.
I started with Docker Compose instead of moving the whole observability backend into Kubernetes immediately. That gave me a smaller blast radius: application pods can export OTLP data to one host, while Prometheus, Loki, Tempo, and Grafana run as a separate backend.
The stack looks like this:
Kubernetes app -> OTLP HTTP/gRPC -> OpenTelemetry Collector -> Prometheus/Loki/Tempo -> Grafana
This post uses the same fake application set as the other examples. The
OpenTelemetry snippets focus on example-api, while the same pattern can be
repeated for example-worker and example-admin.
example-api:service.name=example-api,service.namespace=exampleexample-worker:service.name=example-worker,service.namespace=exampleexample-admin:service.name=example-admin,service.namespace=example
Series
This post is part of my home Kubernetes GitOps series:
- Bootstrap a new RKE cluster for GitOps
- Use Argo CD to manage my home Kubernetes cluster
- Use Vault and External Secrets in Kubernetes
- Run Istio ambient mode with waypoint proxies
- Expose Kubernetes services with Istio Gateway API
- Build an OpenTelemetry stack for Kubernetes apps
- Run Airflow on Kubernetes with GitOps-managed values
- Use Mozilla SOPS with GitOps for encrypted Kubernetes Secrets
Services
The compose stack has five services:
otel-collector: receives OTLP and reads pod logsprometheus: scrapes collector-exported metricsloki: stores logstempo: stores tracesgrafana: browses metrics, logs, and traces
The collector exposes common OTLP ports:
4317: OTLP gRPC4318: OTLP HTTP9464: Prometheus scrape endpoint
Docker Compose shape
The collector needs access to Kubernetes pod stdout logs on the host.
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.153.0
restart: unless-stopped
user: "0:0"
command:
- --config=/etc/otelcol-contrib/config.yaml
ports:
- "4317:4317"
- "4318:4318"
- "9464:9464"
volumes:
- ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml:ro
- /var/log/pods:/var/log/pods:ro
Running the collector as root is not elegant, but it was the practical way to
read pod log files from /var/log/pods in this environment. If the host
permissions are different, I would prefer a narrower user/group setup.
Prometheus scrapes the collector:
prometheus:
image: prom/prometheus:v3.11.3
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
ports:
- "9090:9090"
Loki and Tempo store logs and traces:
loki:
image: grafana/loki:3.7.2
ports:
- "3100:3100"
tempo:
image: grafana/tempo:3.0.0
command:
- -target=all
- -config.file=/etc/tempo.yaml
ports:
- "3200:3200"
- "4319:4317"
For Grafana, I avoid treating default credentials as a real setup. Use a strong admin password or a secret file for anything persistent.
grafana:
image: grafana/grafana:13.0.1-security-01
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_USER: admin
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:?set a password}
GF_AUTH_ANONYMOUS_ENABLED: "false"
Collector receivers
The collector receives OTLP data from applications:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
It also reads Kubernetes pod log files:
file_log/example_api:
include:
- /var/log/pods/example-api_example-api-*_*/app/*.log
start_at: end
include_file_path: true
operators:
- type: container
- type: regex_parser
parse_from: attributes["log.file.path"]
regex: '^/var/log/pods/(?P<k8s_namespace_name>[^_]+)_(?P<k8s_pod_name>[^_]+)_[^/]+/(?P<k8s_container_name>[^/]+)/(?P<k8s_restart_count>\d+)\.log$'
The path pattern is important. A ready Loki does not mean logs exist. If the collector cannot read the host path or the include pattern is wrong, Grafana Explore will still look empty.
Resource attributes
I add deployment/environment attributes in the collector so metrics, logs, and traces can line up.
processors:
resource:
attributes:
- key: deployment.environment.name
value: production
action: upsert
For pod logs, I transform parsed file path attributes into Kubernetes resource attributes:
transform/example_logs:
error_mode: ignore
log_statements:
- context: log
statements:
- set(resource.attributes["k8s.namespace.name"], log.attributes["k8s_namespace_name"])
- set(resource.attributes["k8s.pod.name"], log.attributes["k8s_pod_name"])
- set(resource.attributes["k8s.container.name"], log.attributes["k8s_container_name"])
- set(resource.attributes["service.namespace"], "example")
- set(resource.attributes["service.name"], "example-api")
- set(resource.attributes["deployment.environment.name"], "production")
This makes Loki labels and Grafana queries more useful than raw file names.
Pipelines
The collector has separate pipelines for traces, metrics, and logs.
exporters:
prometheus:
endpoint: 0.0.0.0:9464
resource_to_telemetry_conversion:
enabled: true
otlp_grpc/tempo:
endpoint: tempo:4317
tls:
insecure: true
otlp_http/loki:
endpoint: http://loki:3100/otlp
service:
pipelines:
traces:
receivers:
- otlp
processors:
- memory_limiter
- resource
- batch
exporters:
- otlp_grpc/tempo
metrics:
receivers:
- otlp
processors:
- memory_limiter
- resource
- batch
exporters:
- prometheus
logs:
receivers:
- otlp
- file_log/example_api
processors:
- memory_limiter
- transform/example_logs
- resource
- batch
exporters:
- otlp_http/loki
The Tempo and Loki endpoints are inside the Docker network, so plain internal service names are enough for this compose stack.
App environment
From inside a Kubernetes pod, localhost means the pod itself. The OTLP endpoint
must be a host reachable from the cluster.
For OTLP HTTP:
OTEL_SERVICE_NAME=example-apiOTEL_RESOURCE_ATTRIBUTES=deployment.environment.name=production,service.namespace=exampleOTEL_EXPORTER_OTLP_ENDPOINT=http://otel.example.internal:4318OTEL_EXPORTER_OTLP_PROTOCOL=http/protobufOTEL_LOGS_EXPORTER=otlpOTEL_METRICS_EXPORTER=otlpOTEL_TRACES_EXPORTER=otlp
For OTLP gRPC:
OTEL_SERVICE_NAME=example-apiOTEL_RESOURCE_ATTRIBUTES=deployment.environment.name=production,service.namespace=exampleOTEL_EXPORTER_OTLP_ENDPOINT=http://otel.example.internal:4317OTEL_EXPORTER_OTLP_PROTOCOL=grpcOTEL_LOGS_EXPORTER=otlpOTEL_METRICS_EXPORTER=otlpOTEL_TRACES_EXPORTER=otlp
If the app receives its .env from Vault and External Secrets, I add these
values to Vault instead of committing them into Git.
The examples above use plain HTTP OTLP inside an internal network. If telemetry crosses an untrusted network, I would put TLS in front of the collector or use an OTLP endpoint that supports TLS directly.
Start and check
Start the stack:
cd opentelemetry
docker compose up -d
Check service readiness:
docker compose ps
curl http://localhost:9090/-/ready
curl http://localhost:3100/ready
curl http://localhost:3200/ready
Then verify data, not only service health.
For Prometheus, I start with up.
For Loki, I use {service_name=~".+"} to confirm that any service-labelled log
stream exists.
For Tempo, I search by service.name = example-api.
If Loki is ready but {service_name=~".+"} returns nothing, I do not blame the
Grafana UI first. I check whether the collector is reading pod logs and whether
the app is exporting OTLP logs.
Common problems
localhost from a pod points to the pod, not the Docker host. Use a reachable
host name for the OTLP endpoint.
Loki can be healthy and still have no streams. Check labels or a broad matcher before assuming Grafana is broken.
Tempo config can change between major versions. If Tempo crash-loops after an upgrade, check the config shape before debugging Docker networking.
High-cardinality labels make dashboards noisy. Normalize route labels in the app or collector before they become Prometheus series.
Conclusion
This compose stack is a good middle step. The Kubernetes app gets real metrics, logs, and traces, but the observability backend stays outside the cluster while I iterate.
The important validation lesson is simple: a green backend is not the same thing as ingested data. I check Prometheus, Loki, and Tempo directly before declaring the pipeline healthy.