Jun 18, 2026

The easiest way to add LLM observability to your AI app in 2026

The fastest useful LLM observability setup is one trace, one generation, and one flush. Currai gets you there without running collectors or rebuilding your app.

TUTORIAL7 min readThe Currai team / Engineering

Currai

TL;DR: the easiest LLM observability setup is the one that gets a real trace into your dashboard before observability becomes its own project. With Currai, install the SDK, create a client, wrap one model call, attach token usage, and flush at the end of the request. From there, you can add retrieval spans, tool spans, sessions, prompt versions, and evals without changing the mental model.

Why LLM observability setup gets delayed

Every team agrees they should trace LLM calls. Fewer teams do it before launch. The reason is usually setup drag.

Traditional observability asks for agents, collectors, exporters, dashboards, sampling rules, and sometimes self-hosted storage. LLM observability adds a different set of concerns: prompt inputs, completions, model names, token usage, tool calls, retrieval context, prompt versions, user sessions, and costs. If capturing that data feels like a platform migration, teams postpone it.

That delay is expensive. When the first production issue appears, there is no record of what the model saw, which prompt version ran, what retrieval returned, how many tokens were spent, or why an agent called the wrong tool. You are left with screenshots and guesses.

The setup target should be smaller: one real trace, in the dashboard, from the code path users hit.

What "easy" should mean

An easy LLM observability setup has five properties:

No backend to operate. You should not need to stand up storage before the first trace.
One clear data model. A request is a trace. LLM calls are generations. Retrieval, tools, routing, and custom work are spans.
Explicit instrumentation. The code should show what is being recorded, so debugging the instrumentation is straightforward.
Production metadata from day one. Session IDs, user IDs, tags, prompt versions, token usage, and environment should travel with the trace.
A path from traces to evals. The same data used for debugging should become the dataset for quality checks.

Currai is built around that shape. You do not need a separate tracing concept for evals later. The trace you capture today is the record you can score tomorrow.

Step 1: Install the SDK

Install currai in the service that makes the model call.

pnpm add currai

For Python services:

pip install currai

Create a public and secret key in the Currai dashboard, then set them in your environment:

CURRAI_PUBLIC_KEY=pk-lf-...
CURRAI_SECRET_KEY=sk-lf-...

The public key identifies the project. The secret key authenticates writes. If you self-host Currai, also set CURRAI_BASE_URL; otherwise use the hosted default.

Step 2: Create one client per process

In a server app, reuse the client. In Next.js or another hot-reloading runtime, store it on globalThis so development reloads do not create a new client for every file change.

import { Currai } from "currai";

declare global {
  var __currai: Currai | undefined;
}

export function getCurrai() {
  if (!globalThis.__currai) {
    globalThis.__currai = new Currai({
      publicKey: process.env.CURRAI_PUBLIC_KEY!,
      secretKey: process.env.CURRAI_SECRET_KEY!,
    });
  }

  return globalThis.__currai;
}

For a local demo where you want traces to appear immediately, you can pass flushAt: 1 and flushInterval: 1000. For production, keep batching enabled so tracing does not add unnecessary request overhead.

Step 3: Wrap the first LLM call

A trace is one logical request. A generation is one model call inside that request.

const currai = getCurrai();

const trace = currai.trace({
  name: "chat-turn",
  sessionId,
  userId,
  input: { messages },
  environment: process.env.NODE_ENV,
  tags: ["chat"],
});

const generation = trace.generation({
  name: "openai.chat.completions",
  model: "gpt-4o-mini",
  input: messages,
  modelParameters: { temperature: 0.2 },
});

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages,
  temperature: 0.2,
});

const text = response.choices[0]?.message.content ?? "";

generation.end({
  output: text,
  usage: {
    input: response.usage?.prompt_tokens ?? null,
    output: response.usage?.completion_tokens ?? null,
    total: response.usage?.total_tokens ?? null,
    unit: "TOKENS",
  },
});

trace.update({ output: text });

That is the first useful trace. It records the input, output, model, parameters, session, user, token usage, and environment in one place.

Step 4: Record errors instead of losing them

The most valuable traces are often failed requests. End the generation or span with an error level before rethrowing.

try {
  const response = await callModel();
  generation.end({ output: response.text, usage: response.usage });
} catch (error) {
  generation.end({
    level: "ERROR",
    statusMessage: error instanceof Error ? error.message : String(error),
  });
  throw error;
}

If the request fails before the model call, end the trace-level span that failed. The rule is simple: if a step matters to debugging, it should end with either an output or an error.

Step 5: Flush in short-lived runtimes

Currai batches in the background. That is what you want for production servers. In scripts, jobs, and serverless handlers, flush before the process exits or the response is finalized.

await currai.flushAsync();

This is especially important for streaming responses. Use your framework's finish or error hook to end the generation, update the trace, and flush.

What to add after the first trace

Add retrieval spans around vector search, reranking, document loading, and context assembly. When a RAG answer is wrong, these spans tell you whether the model ignored good context or never received it.

Add tool spans around function calls, MCP tools, API requests, browser actions, and database lookups. Agent bugs are much easier to debug when each tool call has its own input, output, latency, and error state.

Add prompt versions to generations. Pass promptName and promptVersion when a prompt is managed through Currai. That lets you group traces by the exact version that served the request, run evals on real outputs, and compare A/B test variants.

Add session and user IDs early. A single bad answer is useful, but a full conversation is often the real debugging unit. Session grouping lets you replay the sequence that led to the issue.

Add metadata and tags for routing. Feature name, tenant, deployment, model provider, experiment arm, and region are all cheap to attach and useful during an incident.

From setup to production workflow

After traces are flowing, the same records support the rest of the AI operations loop:

Debug slow generations by comparing model latency, retrieval latency, and tool latency.
Track cost by token usage and model.
Review failed or low-quality traces.
Build eval datasets from production examples.
Compare prompt versions on quality, latency, and cost.
Turn bad production outputs into regression cases.

That is why the first setup should happen in the application, not in a separate notebook. The trace is only valuable if it represents the path users actually hit.

FAQs: LLM observability setup

Do I need OpenTelemetry to start?

No. Currai can record traces directly through the SDK. OpenTelemetry is useful when you already have a broader tracing pipeline or want to connect Currai to an existing observability stack, but it is not required for the first LLM trace.

Should I trace every request?

Start by tracing the important LLM paths. Sampling can come later if volume or retention requires it. Early on, missing the one production issue you needed to debug is usually more expensive than storing a few extra traces.

What is the minimum useful trace?

At minimum: trace name, input, generation name, model, generation input, generation output, token usage, and final trace output. Add sessionId, userId, and prompt version as soon as the feature is user-facing.

Related Currai pages

Back to blog

The easiest way to add LLM observability to your AI app in 2026

Why LLM observability setup gets delayed

What "easy" should mean

Step 1: Install the SDK

Step 2: Create one client per process

Step 3: Wrap the first LLM call

Step 4: Record errors instead of losing them

Step 5: Flush in short-lived runtimes

What to add after the first trace

From setup to production workflow

FAQs: LLM observability setup

Do I need OpenTelemetry to start?

Should I trace every request?

What is the minimum useful trace?

Related Currai pages

Debug a slow RAG pipeline with nested traces

Track token cost across models, traces, and days

Trace your first LLM call in 5 lines

The easiest way to add LLM observability to your AI app in 2026

Why LLM observability setup gets delayed

What "easy" should mean

Step 1: Install the SDK

Step 2: Create one client per process

Step 3: Wrap the first LLM call

Step 4: Record errors instead of losing them

Step 5: Flush in short-lived runtimes

What to add after the first trace

From setup to production workflow

FAQs: LLM observability setup

Do I need OpenTelemetry to start?

Should I trace every request?

What is the minimum useful trace?

Related Currai pages

Related articles

Debug a slow RAG pipeline with nested traces

Track token cost across models, traces, and days

Trace your first LLM call in 5 lines