Mar 6, 2026

Trace a multi-turn chatbot end to end

A chatbot that works for one message often breaks by message seven. Here's how to trace a full conversation — sessions, history, and cost — so you can debug the whole thread.

TUTORIAL6 min readThe Currai team / Engineering

Currai

Single-turn demos always work. The trouble starts when the conversation gets long: the bot forgets what it was told, the prompt quietly outgrows the context window, and cost climbs with every turn as history is replayed. To debug any of that, you have to trace the conversation as a whole, not one message at a time.

One trace per turn, one session for the thread

Each user message is its own trace — it's a distinct unit of work. What ties them together is a shared session_id, so the whole conversation reassembles into one timeline.

def handle_turn(conversation_id, user_id, message, history):
    trace = currai.trace(
        name="chat-turn",
        session_id=conversation_id,   # same across the whole thread
        user_id=user_id,
        tags=["chatbot", "production"],
    )

    gen = trace.generation(
        name="answer",
        model="gpt-4o-mini",
        input=history + [message],
        metadata={"turn": len(history) // 2 + 1},
    )
    reply = client.chat.completions.create(
        model="gpt-4o-mini", messages=history + [message],
    )
    gen.end(
        output=reply.choices[0].message.content,
        usage=reply.usage.__dict__,
    )
    return reply

Use your own conversation row id as the session id — it ties the trace straight back to your database.

Read the conversation, not the turn

With the session in place, open the timeline and the failure modes that only appear over many turns become visible:

Lost context — the fact established at turn 2 is missing from the prompt at turn 6. The history payload shows exactly where it dropped.
Context window pressure — token counts climb every turn until the model starts truncating. The per-turn usage makes the ceiling obvious.
Drift — the bot's tone or accuracy degrades as the thread lengthens. Scoring each turn shows quality sliding down the conversation.

Watch the cost compound

The quiet killer of chatbot economics is history replay: every turn resends the whole conversation, so a 20-turn chat can cost far more than 20 single messages. The session roll-up surfaces that compounding cost, and it's the number that tells you when to switch from replaying full history to summarizing it.

From thread to fix

Once you can see the whole conversation, the fixes are concrete: summarize history past N turns, trim the system prompt, or cap the context you replay. Trace the change the same way and compare sessions before and after. Debugging a chatbot stops being "I think it forgets things" and becomes "history dropped at turn 6 — here's the trace, here's the fix."

Back to blog

Trace a multi-turn chatbot end to end

One trace per turn, one session for the thread

Read the conversation, not the turn

Watch the cost compound

From thread to fix

How to build an AI FAQ chatbot trained on your documentation

How to build a customer support chatbot for your website (step-by-step)

LLM red teaming: a step-by-step guide

Trace a multi-turn chatbot end to end

One trace per turn, one session for the thread

Read the conversation, not the turn

Watch the cost compound

From thread to fix

Related articles

How to build an AI FAQ chatbot trained on your documentation

How to build a customer support chatbot for your website (step-by-step)

LLM red teaming: a step-by-step guide