Instrumentation

Generations

Capture every model call — prompt, completion, model, parameters, and timing — as a generation inside a trace.

A generation is a single model call. Currai records the exact prompt you sent, the completion you got back, the model and its parameters, and how long it took — so you can replay any response and understand why the model said what it said.

Create a generation

Call generation() on a trace (or on a span, to nest it) before the model call, then end() it afterwards with the output and usage:

code

const generation = trace.generation({
  name: "openai.chat.completions",
  model: "gpt-4o-mini",
  modelParameters: { temperature: 0.2, maxTokens: 512 },
  input: messages,
});

const reply = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages,
});

generation.end({ output: reply.choices[0].message.content });

code

generation = trace.generation(
    name="openai.chat.completions",
    model="gpt-4o-mini",
    model_parameters={"temperature": 0.2, "max_tokens": 512},
    input=messages,
)

reply = openai_client.chat.completions.create(model="gpt-4o-mini", messages=messages)

generation.end(output=reply.choices[0].message.content)

What gets captured

Input and output — full payloads, including system and tool messages.
Model — the model name, used for cost lookup.
Model parameters — temperature, top_p, maxTokens, and any custom parameters you pass.
Timing — start and end times, so latency is computed for every call.
Usage — token counts you pass on end() (see Cost & tokens).

Updating a generation

Long or streaming calls can be enriched as more becomes known. Call update() to set fields before the final end():

code

generation.update({ modelParameters: { stream: true } });
// … stream tokens …
generation.end({ output: fullText, usage });

Most real apps are more than one call. Next: nesting spans and tools.