10 - Langfuse observability¶
Send LLM call observability to Langfuse natively — Trace at the top, Span observations for graph nodes, Generation observations with input, output, token usage, model parameters, and a native link back to the prompt entity the call rendered from.
Overview¶
A mission-briefing assistant answers questions about Apollo and Artemis missions. The pipeline fetches a versioned prompt template, renders it with the user's question, sends the rendered messages to the model, and stores the response. The Langfuse observer captures the full call shape as the graph runs.
The demo's prompt backend stubs a Langfuse-source by attaching a
sentinel langfuse_prompt reference to the rendered prompt. The
Generation observation reads that reference and links back to the
prompt entity — exactly what you'd see in a production Langfuse
dashboard threading "this generation came from prompt v7" without any
manual wiring at the call site.
What it teaches¶
LangfuseObserverattaches like any other observer; nothing in the node code knows or cares about which backend is recording.- The
LangfuseClientProtocol decouples the observer from the SDK. The bundledInMemoryLangfuseClientrecorder is the test/demo shape; production passes a reallangfuse.Langfuse()instance (or a thin adapter — see Reading the output below). - Prompt linkage through
Prompt.observability_entities: a prompt backend that exposes a Langfuse Prompt entity reference surfaces it on every Generation that renders from that prompt. Filesystem / in-memory backends without that reference work too, they just produce metadata-only linkage. disable_llm_payload=Falseopt-in for capturing input messages + output content on Generation observations. Default-off is the privacy posture; the demo deliberately flips it.correlation_idcross-cutting metadata on the Trace and every Observation — the join key if you're also running an OTel observer alongside.
How to run¶
uv sync --group examples
LLM_API_KEY=sk-... uv run python examples/10-langfuse-observability/main.py \
"what year did Apollo 11 land"
The first positional arg becomes the question. The demo uses an in-memory recorder so no Langfuse account is needed.
The graph¶
flowchart TD
start([start])
answer[answer_briefing]
stop([end])
start --> answer --> stop
A single-node graph: fetch the prompt, render with the question, call
the LLM under with_active_prompt(...), store the response. The
single node is deliberate — the value is in the captured Trace shape,
not the graph topology.
Reading the output¶
After the answer prints, the script renders the captured Langfuse Trace + Observation tree:
question: what year did Apollo 11 land
answer: Apollo 11 landed on the Moon on July 20, 1969 ...
prompt: mission-briefing v7
─── captured Langfuse trace ─────────────────────────────────
Trace id=01234567-89ab-...
name='answer_briefing'
metadata={correlation_id='...', entry_node='answer_briefing', spec_version='0.26.0'}
[span] 'answer_briefing' level=DEFAULT
metadata={attempt_index=0, correlation_id='...', namespace=['answer_briefing'], step=0}
[generation] 'openarmature.llm.complete' level=DEFAULT
metadata={correlation_id='...', finish_reason='stop', prompt={...},
response_id='...', response_model='gpt-4o-mini-2024-...',
system='openai'}
model='gpt-4o-mini'
usage=input:48 output:32 total:80
prompt_entity_link='lf-prompt-mission-briefing-v7'
output='Apollo 11 landed on the Moon on July 20, 1969 ...'
- Trace name = entry node name by default. The caller-supplied
invocation-label path (a per-
invoke()argument that overrides the default) ships with proposal 0034's caller-metadata work. - Span observation per node.
answer_briefingis the only node here; a multi-node graph would produce a tree of nested Span observations under the Trace. - Generation observation per LLM call. Carries
model,usage,output, and the prompt-identity metadata. In a production Langfuse dashboard this is what the "Generation" detail view renders. prompt_entity_linkis the valuePrompt.observability_entities['langfuse_prompt']carried — a sentinel string in this demo, a real Langfuse SDK Prompt object in production. When the backend doesn't surface the reference (e.g., a filesystem backend), the link is absent but themetadata.promptmap (name, version, label, hashes) still appears for traceability.
Swapping to a real Langfuse SDK¶
The observer's client parameter is LangfuseClient-Protocol-typed,
so any structurally-compatible value works:
from langfuse import Langfuse
client = Langfuse(
public_key="pk-lf-...",
secret_key="sk-lf-...",
host="https://cloud.langfuse.com",
)
observer = LangfuseObserver(client=client, disable_llm_payload=False)
If the installed SDK version's trace / span / generation method
signatures match the Protocol exactly, this is the whole change. If
they diverge (renamed kwargs, return-type quirks), wrap the SDK in a
small adapter class that implements LangfuseClient and delegates to
the SDK call-by-call. The Protocol surface is narrow — four methods —
so the adapter is on the order of 40 lines.
No specific langfuse SDK version is validated in CI as of this
release. The Protocol matches the SDK's documented low-level shape,
but langfuse has shifted between major versions (v2 → v3 introduced
API changes). A follow-on release pins a tested [langfuse] extras
range and a runtime isinstance(client, LangfuseClient) check; until
then, smoke-trace in your own environment with whichever langfuse
version your stack already uses and write a thin adapter if any
kwargs don't line up.
For prompt linkage: in production, the
Prompt.observability_entities['langfuse_prompt'] value is the SDK's
own Prompt-entity object (returned by langfuse_client.get_prompt(...))
rather than the sentinel string this demo uses. The observer passes
that value straight through to the SDK's generation(..., prompt=...)
argument, which is what the SDK uses to establish the native link.
Composition with OTel¶
Both observers consume the same NodeEvent stream and can be attached
together:
graph.attach_observer(OTelObserver(span_processor=batch))
graph.attach_observer(LangfuseObserver(client=langfuse_client))
Their disable_llm_spans / disable_llm_payload flags are
independent. The correlation_id cross-cutting attribute is the join
key — find a slow Generation in Langfuse, search for the
correlation_id in OTel logs to see the surrounding infrastructure
activity.