Docs / OpenTelemetry
OpenTelemetry parallel emission.
When Mesedi observes an execution complete, it can optionally translate the run into an OTel trace and ship it via OTLP/HTTP to a collector you configure. One root span per execution, one child span per event. Use this to let agent traces sit alongside the rest of your infrastructure telemetry in Datadog, Honeycomb, Grafana Tempo, or any OTel Collector.
Enabling emission
Emission is opt-in by env var. If OTEL_EXPORTER_OTLP_ENDPOINT is unset, Mesedi behaves exactly as before (the emitter initializes in disabled mode and every internal Emit call is a no-op). Set the endpoint and Mesedi starts shipping spans on the next terminal-status PATCH.
# Required: where to send OTLP spans
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
# Optional: comma-separated auth headers
OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=YOUR_TOKEN
# Optional: service.name on the resource attribute set
OTEL_SERVICE_NAME=mesedi-prodThe endpoint accepts the standard OTel form (URL with scheme). Schemes other than https are treated as insecure and emission will skip TLS. Auth headers use the OTel standard key=value,key=value format; Mesedi parses it the same way the OTel SDK would.
On startup the backend logs the chosen endpoint and the active semantic-conventions mode. If you grep your logs for otel: you will see either:
otel: parallel emission enabled endpoint=https://api.honeycomb.io service_name=mesedi-prod semconv_mode=stable
# or, when no endpoint is configured:
otel: parallel emission disabled (OTEL_EXPORTER_OTLP_ENDPOINT unset)What the receiver sees
Each terminal execution becomes one OTel trace:
- One root span named
mesedi.execution, kind = SERVER, timed to (started_at -> ended_at). Status set to Ok for completed executions, Error for any other terminal status. Carries the execution_id, project_id, status, duration, tokens in/out, cost, tenant_id, parent_execution_id, failure_group_id, pause_count, and total_paused_ms as attributes. - One child span per event, named
mesedi.<event_type>(for examplemesedi.llm_call,mesedi.tool_call,mesedi.agent_handoff). Kind = INTERNAL, timed to (event.timestamp -> event.timestamp + event.duration_ms). Carries the event_id, sequence, duration_ms, and a typed peek of well-known payload fields per event type.
Multi-agent topology (parent_execution_id) is preserved as the mesedi.parent_execution_id attribute on the root span rather than as a cross-trace parent context, so downstream backends (Datadog, Honeycomb, Grafana Tempo) can join across traces by that attribute.
GenAI semantic conventions
The OpenTelemetry spec for Generative AI applications defines a set of incubating attribute names like gen_ai.system, gen_ai.request.model, and gen_ai.usage.input_tokens. Strict OTel Collector deployments may reject incubating names by default, so Mesedi only emits them when the customer opts in via the standard OTel env var:
# Three valid modes:
OTEL_SEMCONV_STABILITY_OPT_IN= # default: emit only stable names
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai # emit only incubating GenAI names
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai/dup # emit both during migration windowsWhen the incubating mode is active, every llm_call event also carries gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.operation.name="chat". The root span also carries gen_ai.usage.input_tokens and gen_ai.usage.output_tokens as aggregates.
Reliability posture
Emission is fire-and-forget. After the terminal PATCH commits and the detector chain runs, a background goroutine reads the events, constructs the spans, and hands them to the OTel SDK's batch processor. Failures inside the goroutine are logged but never surface to the customer; the main pipeline never blocks on OTel.
The exporter runs through the standard OTel SDK batch processor (default 5-second flush, retries on transient errors). On graceful shutdown the process calls provider.Shutdown(ctx) with a 5-second budget to drain pending spans before exit.
When the OTLP endpoint is unreachable, the SDK's built-in retry kicks in; spans that exceed the processor's queue limit are dropped silently rather than buffered to disk. Customers that need guaranteed delivery should point the endpoint at a local OpenTelemetry Collector and let the Collector handle persistence.
What's next?
Failure classes and playbooks for the detector catalog.
Multi-agent topology and handoffs for the parent/child graph that OTel sees as a parent_execution_id attribute.
Self-hosting guide for setting env vars in your own deployment of the Go backend.