In today’s distributed application environments, having a robust observability stack is critical for monitoring and troubleshooting. In this post, we’ll walk through how to set up a complete open-source observability stack using Docker Compose.
Our stack leverages several popular projects:
Below is the Docker Compose file that brings these services together.
File docker-compose.yml:
version: '3.8'
services:
collector:
container_name: collector
image: otel/opentelemetry-collector-contrib:latest
volumes:
- ./config/otel-collector.yml:/etc/otelcol-contrib/otel-collector.yml
command:
- --config=/etc/otelcol-contrib/otel-collector.yml
restart: always
ports:
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
prometheus:
container_name: prometheus
image: prom/prometheus:latest
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheusdata:/prometheus
command:
- --config.file=/etc/prometheus/prometheus.yml
- --web.enable-remote-write-receiver
- --web.enable-otlp-receiver
- --enable-feature=exemplar-storage
- --storage.tsdb.retention.time=15d
restart: always
ports:
- '9090:9090'
depends_on:
- collector
loki:
container_name: loki
restart: always
image: grafana/loki:latest
volumes:
- lokidata:/loki
ports:
- "3100:3100"
depends_on:
- collector
tempo:
container_name: tempo
restart: always
image: grafana/tempo:latest
volumes:
- ./config/tempo.yml:/etc/tempo.yml
- tempodata:/var/tempo
command:
- -config.file=/etc/tempo.yml
ports:
- "3200:3200" # tempo
grafana:
container_name: grafana
image: grafana/grafana
restart: always
volumes:
- ./config/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
- grafanadata:/var/lib/grafana
ports:
- "3000:3000"
volumes:
grafanadata:
prometheusdata:
lokidata:
tempodata:
The collector is the entry point for telemetry data coming from your instrumented applications. By using the OTEL Collector (from the otel/opentelemetry-collector-contrib image), you can receive data over multiple protocols. In our setup, we expose two ports:
The collector uses a configuration file (./config/otel-collector.yml) that defines receivers, processors, and exporters to route the data to downstream services.
Prometheus collects and stores metrics. In our Compose file:
Loki is designed for aggregating and indexing logs. It is a lightweight solution that integrates seamlessly with Grafana. In this stack:
Tempo is Grafana’s distributed tracing system. It’s built to scale and is optimized for high cardinality tracing data:
Grafana is our unified observability dashboard:
While the Docker Compose file sets up the containers and mounts the necessary configuration files, you’ll need to create these files in a local ./config directory. Here’s a brief overview of what each file should include:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Define processors to process received data.
# See https://opentelemetry.io/docs/collector/configuration/#connectors
connectors:
# The spanmetrics connector is used to output span metrics based on received trace spans.
spanmetrics:
namespace: traces.spanmetrics # Prefix all metrics with `traces.spanmetrics` (this becomes `traces_spanmetrics`).
# Determine the type of histogram to use for span metrics.
histogram:
explicit: # Explicit histograms have pre-defined bucket sizes (use default here).
# Defines additional label dimensions of the metrics from trace span attributes present.
dimensions:
- name: http.method
- name: http.target
- name: http.status_code
- name: service.version
# Ensure exemplars are enabled and sent to the metrics store.
exemplars:
enabled: true
processors:
batch:
# The tail sampler processor will only keep traces where spans match the defined policies.
tail_sampling:
decision_wait: 30s # The time to wait for a decision to be made.
# The following policies follow a logical OR pattern, meaning that if any of the policies match,
# the trace will be kept. For logical AND, you can use the `and` policy. Every span of a trace is
# examined by each policy in turn. A match will cause a short-circuit.
policies: [
# This policy defines that traces that include spans that contain errors should be kept.
{
name: sample-erroring-traces, # Name of the policy.
type: status_code, # The type must match the type of policy to be used.
status_code: { status_codes: [ERROR] } # Only sample traces which have a span containing an error.
},
# This policy defines that traces that are over 200ms should be sampled.
{
name: sample-long-traces, # Name of the policy.
type: latency, # The type must match the type of policy to be used.
latency: { threshold_ms: 200 }, # Only sample traces which are longer than 200ms in duration.
},
]
exporters:
otlphttp/prometheus:
endpoint: http://prometheus:9090/api/v1/otlp
otlphttp/loki:
endpoint: http://loki:3100/otlp
otlphttp/tempo:
endpoint: http://tempo:4318
tls:
insecure: true
service:
pipelines:
metrics:
receivers: [otlp, spanmetrics]
processors: [batch]
exporters: [otlphttp/prometheus]
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp/loki]
traces:
receivers: [otlp]
processors: [tail_sampling, batch]
#processors: [batch]
exporters: [otlphttp/tempo, spanmetrics]
otlp:
# Ingest OTLP data keeping UTF-8 characters in metric/label names.
# translation_strategy: NoUTF8EscapingWithSuffixes
keep_identifying_resource_attributes: true
# Recommended attributes to be promoted to labels.
promote_resource_attributes:
- service.instance.id
- service.name
- service.namespace
- cloud.availability_zone
- cloud.region
- container.name
- deployment.environment.name
- k8s.cluster.name
- k8s.container.name
- k8s.cronjob.name
- k8s.daemonset.name
- k8s.deployment.name
- k8s.job.name
- k8s.namespace.name
- k8s.pod.name
- k8s.replicaset.name
- k8s.statefulset.name
storage:
tsdb:
out_of_order_time_window: 30m
server:
http_listen_port: 3200
distributor:
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
storage:
trace:
backend: local # backend configuration to use
wal:
path: /tmp/tempo/wal # where to store the wal locally
local:
path: /tmp/tempo/blocks
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
uid: prometheus
access: proxy
url: http://prometheus:9090
basicAuth: false
isDefault: false
editable: false
jsonData:
exemplarTraceIdDestinations:
- name: trace_id
datasourceUid: tempo
- name: Loki
type: loki
uid: loki
url: http://loki:3100
jsonData:
derivedFields:
- name: 'trace_id'
matcherRegex: 'trace_id'
matcherType: 'label'
url: '$${__value.raw}'
datasourceUid: 'tempo'
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo:3200
basicAuth: false
isDefault: true
version: 1
editable: false
apiVersion: 1
uid: tempo
jsonData:
tracesToLogs:
datasourceUid: 'loki'
mappedTags: [ { key: 'service.name', value: 'service_name' } ]
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
mapTagNamesEnabled: true
filterByTraceID: true
serviceMap:
datasourceUid: 'prometheus'
search:
hide: false
nodeGraph:
enabled: true
lokiSearch:
datasourceUid: 'loki'
For more detailed configurations, consult the documentation for each component.
Once you have your configuration files in place, you can launch the full stack with a single command:
docker-compose up -d
This command will:
After a few moments, you should be able to access:
Now that your observability stack is running, the next step is to instrument your application. By using OpenTelemetry libraries in your code (whether via the Java agent, low-code annotations, or manual API calls), your application can send metrics, traces, and logs directly to the collector. The collector then routes this data to Prometheus, Loki, and Tempo for storage and visualization.
For example, if you’re using a Java application:
This integration gives you a complete pipeline—from instrumenting your application code to visualizing the resulting telemetry in Grafana.
By following this guide, you can quickly set up an open-source observability stack that scales with your application. This stack provides you with:
As you grow your application, you can customize and extend each component’s configuration to meet your specific needs. Happy observability!
Contact us to explore how we can help you build a powerful open-source observability stack with OpenTelemetry, Grafana, Prometheus, Loki, and Tempo. Let’s work together to enhance monitoring, streamline troubleshooting, and optimize your applications!