The easiest way to add LLM observability to your AI app in 2026
The fastest useful LLM observability setup is one trace, one generation, and one flush. Currai gets you there without running collectors or rebuilding your app.
TL;DR: the easiest LLM observability setup is the one that gets a real trace into your dashboard before observability becomes its own project. With Currai, install the SDK, create a client, wrap one model call, attach token usage, and flush at the end of the request. From there, you can add retrieval spans, tool spans, sessions, prompt versions, and evals without changing the mental model.
Why LLM observability setup gets delayed
Every team agrees they should trace LLM calls. Fewer teams do it before launch. The reason is usually setup drag.
Traditional observability asks for agents, collectors, exporters, dashboards, sampling rules, and sometimes self-hosted storage. LLM observability adds a different set of concerns: prompt inputs, completions, model names, token usage, tool calls, retrieval context, prompt versions, user sessions, and costs. If capturing that data feels like a platform migration, teams postpone it.
That delay is expensive. When the first production issue appears, there is no record of what the model saw, which prompt version ran, what retrieval returned, how many tokens were spent, or why an agent called the wrong tool. You are left with screenshots and guesses.
The setup target should be smaller: one real trace, in the dashboard, from the code path users hit.
What "easy" should mean
An easy LLM observability setup has five properties:
- No backend to operate. You should not need to stand up storage before the first trace.
- One clear data model. A request is a trace. LLM calls are generations. Retrieval, tools, routing, and custom work are spans.
- Explicit instrumentation. The code should show what is being recorded, so debugging the instrumentation is straightforward.
- Production metadata from day one. Session IDs, user IDs, tags, prompt versions, token usage, and environment should travel with the trace.
- A path from traces to evals. The same data used for debugging should become the dataset for quality checks.
Currai is built around that shape. You do not need a separate tracing concept for evals later. The trace you capture today is the record you can score tomorrow.
Step 1: Install the SDK
Install currai in the service that makes the model call.
For Python services:
Create a public and secret key in the Currai dashboard, then set them in your environment:
The public key identifies the project. The secret key authenticates writes. If
you self-host Currai, also set CURRAI_BASE_URL; otherwise use the hosted
default.
Step 2: Create one client per process
In a server app, reuse the client. In Next.js or another hot-reloading runtime,
store it on globalThis so development reloads do not create a new client for
every file change.
For a local demo where you want traces to appear immediately, you can pass
flushAt: 1 and flushInterval: 1000. For production, keep batching enabled so
tracing does not add unnecessary request overhead.
Step 3: Wrap the first LLM call
A trace is one logical request. A generation is one model call inside that request.
That is the first useful trace. It records the input, output, model, parameters, session, user, token usage, and environment in one place.
Step 4: Record errors instead of losing them
The most valuable traces are often failed requests. End the generation or span with an error level before rethrowing.
If the request fails before the model call, end the trace-level span that failed. The rule is simple: if a step matters to debugging, it should end with either an output or an error.
Step 5: Flush in short-lived runtimes
Currai batches in the background. That is what you want for production servers. In scripts, jobs, and serverless handlers, flush before the process exits or the response is finalized.
This is especially important for streaming responses. Use your framework's finish or error hook to end the generation, update the trace, and flush.
What to add after the first trace
Add retrieval spans around vector search, reranking, document loading, and context assembly. When a RAG answer is wrong, these spans tell you whether the model ignored good context or never received it.
Add tool spans around function calls, MCP tools, API requests, browser actions, and database lookups. Agent bugs are much easier to debug when each tool call has its own input, output, latency, and error state.
Add prompt versions to generations. Pass promptName and promptVersion
when a prompt is managed through Currai. That lets you group traces by the exact
version that served the request, run evals on real outputs, and compare A/B test
variants.
Add session and user IDs early. A single bad answer is useful, but a full conversation is often the real debugging unit. Session grouping lets you replay the sequence that led to the issue.
Add metadata and tags for routing. Feature name, tenant, deployment, model provider, experiment arm, and region are all cheap to attach and useful during an incident.
From setup to production workflow
After traces are flowing, the same records support the rest of the AI operations loop:
- Debug slow generations by comparing model latency, retrieval latency, and tool latency.
- Track cost by token usage and model.
- Review failed or low-quality traces.
- Build eval datasets from production examples.
- Compare prompt versions on quality, latency, and cost.
- Turn bad production outputs into regression cases.
That is why the first setup should happen in the application, not in a separate notebook. The trace is only valuable if it represents the path users actually hit.
FAQs: LLM observability setup
Do I need OpenTelemetry to start?
No. Currai can record traces directly through the SDK. OpenTelemetry is useful when you already have a broader tracing pipeline or want to connect Currai to an existing observability stack, but it is not required for the first LLM trace.
Should I trace every request?
Start by tracing the important LLM paths. Sampling can come later if volume or retention requires it. Early on, missing the one production issue you needed to debug is usually more expensive than storing a few extra traces.
What is the minimum useful trace?
At minimum: trace name, input, generation name, model, generation input,
generation output, token usage, and final trace output. Add sessionId,
userId, and prompt version as soon as the feature is user-facing.
