What is LLM observability?
LLM observability is the practice of capturing every prompt, completion, token, and tool call so you can explain what your model did and why. Here's what it means and where to start.
LLM observability is the practice of capturing every prompt, completion, token, and tool call so you can explain what your model did and why. Here's what it means and where to start.
Logs tell you a line ran. Traces tell you what the model saw, said, and cost across the whole request. Here's why LLM apps need tracing, not more print statements.
LLM apps fail in ways your APM never sees. We built Currai so you can watch every prompt, token, and tool call the way you already watch latency and errors.
A RAG answer that takes four seconds could be slow retrieval, a fat prompt, or the model itself. Nested traces tell you which one — here's how to find the bottleneck.
Pass token usage on every generation and Currai turns it into cost — rolled up per trace, model, user, and day. Stop guessing what an LLM feature costs.
Install the SDK, paste your keys, wrap a call. A walkthrough of going from print() debugging to a real trace you can replay.
One conversation is many traces. Pass a session id to stitch every turn into one thread, and a user id to slice cost, latency, and volume by the people using your app.
Trace, span, generation — three nouns that cover everything an LLM app does. Understand the data model and your instrumentation stops being guesswork.
Per-trace pricing punishes you for instrumenting well. We price on the data Currai actually processes, so big traces and tiny traces cost what they are.
Currai is byte-compatible with the Langfuse SDKs. If you're already instrumented, you migrate by changing the host — your spans, exporters, and trace code keep working.
OpenTelemetry is the open standard for traces. Currai ingests OTLP spans, so you can use the collector and instrumentation you already run and still get LLM-aware views.
Total latency hides the metric users actually feel — time to first token. Here's how to capture both on every generation and find what's making your LLM app feel slow.
Agents loop, call tools, and call themselves — a single request can be dozens of model calls. Here's how to trace agent runs so you can see exactly where one went off the rails.
Observability data is high-volume and append-heavy — the classic case for running ClickHouse yourself. Here's the real trade-off between self-hosting and a managed backend.
Production traces carry prompts full of user data and arrive at full traffic volume. Here's how to sample for cost and redact for privacy without losing the traces you need.
A single runaway prompt or retry loop can 10x your bill overnight. Here's how to turn the cost data on your traces into budgets and alerts that warn you before the invoice does.
Your LLM calls don't all live in Python. Currai ingests traces over plain HTTP and OTLP, so any language that can make a request can send traces — here's how.
A chatbot that works for one message often breaks by message seven. Here's how to trace a full conversation — sessions, history, and cost — so you can debug the whole thread.