Modern distributed applications can become complex very quickly. Tracking down performance bottlenecks or debugging request flows across microservices is challenging without proper observability.

That’s where OpenTelemetry (OTel) comes in. Combined with Grafana Tempo for distributed tracing, Grafana Loki for logging, and Grafana dashboards for visualization, you get a powerful stack to collect, store, and analyze your system's behavior seamlessly.

In this tutorial, we’ll build a robust observability pipeline. We will setup a standard Node.js (Express) application and a NestJS application, wire them up with OpenTelemetry, and correlate their logs and traces using a Docker Compose stack.

💡 Get the Code: The complete source code for this tutorial is available on GitHub.


🚀 What We're Building

By the end of this guide, you will have:

  1. A Node.js (Express) API automatically instrumented with OpenTelemetry.
  2. A NestJS API, also instrumented, receiving requests from the Express API to demonstrate distributed tracing.
  3. A Docker Compose stack running Grafana, Tempo, Loki, and the OTel Collector.
  4. Seamless log-to-trace correlation in Grafana (clicking a log instantly opens the related trace).

Why this stack?

  • OpenTelemetry: Vendor-neutral standard for telemetry data.
  • Grafana Tempo: A high-volume, minimal-dependency distributed tracing backend.
  • Grafana Loki: A highly efficient log aggregation system designed specifically to work well with Grafana and Prometheus/Tempo.
  • Grafana: The extensive visualization platform everyone loves.

Step 1: The Infrastructure (Docker Compose)

Let's use docker-compose to orchestrate Tempo, Loki, the OTel Collector, and Grafana entirely.

Create a docker-compose.yml in your project root:

version: "3.8"
services:
  # 1. Grafana Tempo (Tracing Backend)
  tempo:
    image: grafana/tempo:latest
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./docker-config/tempo.yaml:/etc/tempo.yaml
    ports:
      - "3200:3200"
      - "4317"

  # 2. OpenTelemetry Collector (The middleman)
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector.yaml"]
    volumes:
      - ./docker-config/otel-collector.yaml:/etc/otel-collector.yaml
    ports:
      - "4317:4317" # OTLP gRPC receiver
      - "4318:4318" # OTLP HTTP receiver
    depends_on:
      - tempo
      - loki

  # 3. Grafana Loki (Logging Backend)
  loki:
    image: grafana/loki:latest
    command: ["-config.file=/etc/loki.yaml"]
    volumes:
      - ./docker-config/loki.yaml:/etc/loki.yaml
    ports:
      - "3100:3100"

  # 4. Grafana (Visualization User Interface)
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    volumes:
      # Auto-provision data sources including our Loki -> Tempo derived field
      - ./docker-config/grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
    depends_on:
      - tempo
      - loki

The Magic: Grafana Configuration

To connect Logs to Traces automatically, we provision the datasources via docker-config/grafana-datasources.yaml:

apiVersion: 1
datasources:
  - name: Tempo
    type: tempo
    uid: tempo
    access: proxy
    url: http://tempo:3200
  - name: Loki
    type: loki
    uid: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      derivedFields:
        - datasourceUid: tempo
          matcherRegex: '"trace_id":"(\w+)"'
          name: traceID
          url: "$${__value.raw}"

Pro Tip: The derivedFields block tells Grafana: "When you see a trace_id in a Loki log, make it clickable and open that ID in Tempo."

(For otel-collector.yaml, tempo.yaml, and loki.yaml, refer to standard minimal configurations, pointing OTLP exports to Tempo and Loki).


Step 2: Instrumenting a Node.js (Express) App

To automatically instrument an Express application and send logs with injected trace IDs, you need the OpenTelemetry node SDK and a logger like winston.

npm install express winston winston-loki @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-http @opentelemetry/resources @opentelemetry/semantic-conventions

Create instrumentation.ts. This file must be imported before anything else in your app to ensure auto-instrumentation wraps the require calls.

// instrumentation.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { Resource } from "@opentelemetry/resources";
import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
import { OTLPTraceExporter as OTLPHttpExporter } from "@opentelemetry/exporter-trace-otlp-http";
// import { OTLPTraceExporter as OTLPGrpcExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";

const sdk = new NodeSDK({
  resource: new Resource({ [ATTR_SERVICE_NAME]: "express-api" }),
  // Defaulting to HTTP for highest cross-network compatibility
  traceExporter: new OTLPHttpExporter({
    url: "http://localhost:4318/v1/traces",
  }),
  // To use gRPC instead (High performance, Port 4317):
  // traceExporter: new OTLPGrpcExporter({ url: 'http://localhost:4317' }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Inside your index.ts:

import express from "express";
import winston from "winston";
import LokiTransport from "winston-loki";

// Winston combined with OTel's auto-instrumentation will automatically
// append `trace_id` and `span_id` into the JSON logs!
const logger = winston.createLogger({
  format: winston.format.json(),
  transports: [
    new winston.transports.Console(),
    new LokiTransport({ host: "http://localhost:3100", json: true }),
  ],
});

const app = express();
app.get("/hello", async (req, res) => {
  logger.info("Received request on /hello");
  // ... make request to NestJS API ...
  res.json({ message: "Hello from Express" });
});
// ... listen on port

Step 3: Instrumenting a NestJS App

For NestJS, the setup is incredibly similar. The key difference is we add @opentelemetry/instrumentation-nestjs-core for framework-specific lifecycle hooks.

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-http @opentelemetry/instrumentation-nestjs-core nest-winston winston winston-loki

Your NestJS instrumentation.ts will look just like the Express one, but with an added array item:

import { NestInstrumentation } from '@opentelemetry/instrumentation-nestjs-core';
// ...
  instrumentations: [
    getNodeAutoInstrumentations(),
    new NestInstrumentation(),
  ],
// ...

In your NestJS main.ts:

import "./instrumentation"; // <--- MUST BE THE VERY FIRST IMPORT
import { NestFactory } from "@nestjs/core";
import { AppModule } from "./app.module";

async function bootstrap() {
  const app = await NestFactory.create(AppModule);
  // ...
}
bootstrap();

Step 4: Launch and Verify

  1. Start the Infrastructure:
docker-compose up -d
  1. Start your Apps:Run both your Express and NestJS apps locally with the instrumentation.ts loaded.
  2. Generate Traffic:Hit your Express API endpoint, which in turn calls your NestJS API:
curl http://localhost:3000/hello

Step 5: Analyzing Logs and Traces in Grafana

Now that our system is generating telemetry, let's look at how to actually use it to debug a request flow.

1. Querying Logs in Loki

Open Grafana at http://localhost

and navigate to the Explore tab. Select your Loki datasource. You can query logs by service name. For example, run: {service_name="express-api"}

Grafana Explore tab showing Loki logs from Express API

2. Discovering the Trace ID

Expand one of the log lines. Because of our OpenTelemetry and Winston integration, you will notice that a trace_id and span_id have been automatically injected into the JSON payload of the log.

Because we configured derivedFields in our Grafana datasources, Grafana recognizes this trace_id and automatically generates a clickable button right next to it!

Zoomed-in log line showing trace_id and clickable Tempo button

3. Visualizing the Distributed Trace in Tempo

Click that trace ID link. Grafana will seamlessly open a split-screen view querying Tempo for that exact trace.

You will be presented with a Waterfall visualization of the entire lifecycle of that single HTTP request. You'll be able to see:

  • The exact millisecond the request hit the Express API.
  • The duration of the Express API processing.
  • The exact moment the Express API made the downstream HTTP call to the NestJS API.
  • How long the NestJS API took to route and respond to the request.

If an error occurred anywhere in this chain, the specific span would be highlighted in red, immediately pointing you to the failing microservice without having to guess!

Tempo waterfall trace view showing parent and child spans crossing services


🔧 Troubleshooting Guide

If things aren't working, don't panic. Distributed tracing involves several moving parts. Here is a checklist to resolve common issues.

1. "I don't see any traces in Grafana"

  • App -> Collector: Look at your console. Do you see HTTP/gRPC errors like ServiceUnavailable? Ensure the OTLP Endpoint is correct (localhost:4317 locally vs otel-collector:4317 in Docker).
  • Collector -> Tempo: Check logs: docker logs otel-collector. Look for connection refused errors. Ensure the collector is connecting to tempo:4317.
  • Protocol Mismatch: Default OTLP exporters typically use gRPC port 4317. HTTP endpoints use 4318. Don't mix them up!

2. "My trace_id is missing from Loki logs"

  • Ensure you are using winston.format.json() if you are relying on OTel's auto-injection. The Node SDK hooks into Winston's JSON formatting to silently inject active span context.
  • Make sure --require ./instrumentation.ts is actually running before your app boots.

⚖️ gRPC vs HTTP for OTel Exporters

You'll notice we provided both HTTP and gRPC exporters in the example code.

  • gRPC (Port 4317): The default and highest performing protocol for OpenTelemetry. It relies on HTTP/2 multiplexing and binary Protobuf streams, vastly reducing network overhead. However, certain strict corporate proxies, older load balancers, or intricate Docker networking rules can struggle to route raw HTTP/2 traffic seamlessly.
  • HTTP/Protobuf (Port 4318): Packs the exact same Protobuf binary structures but relies on standard HTTP POST requests. It's incredibly robust, bypasses practically all proxy routing issues out of the box, and is significantly easier to start with at the cost of slight performance overhead.

If you are debugging locally or running into Connection Refused traces, HTTP is your safest bet!

Conclusion

You have effectively built a robust observability pipeline. By instrumenting early, centralizing traces in Tempo, aggregating logs in Loki, and gluing them together in Grafana, you eliminate the guesswork when microservices start passing messages. You can now trace a user's action seamlessly through your entire architectural stack!