Non obvious shapes

Non-obvious shapes¶

Recipes that aren't deducible from the API surface alone. The primitives docs tell you what's possible; this section tells you what's smart.

Declare a non-clobbering reducer on accumulator list fields¶

State fields default to last_write_wins: each node's write replaces the prior value for that field. For scalar fields (status: str, count: int) that's usually what you want. For list fields that accumulate contributions across multiple nodes (messages: list[Message], events: list[Event], results: list[Result]), it's the wrong default; every node's contribution silently clobbers everything before it.

Declare append (or another non-clobbering reducer) at the state class:

from typing import Annotated
from pydantic import Field
from openarmature.graph import State, append

class WorkflowState(State):
    messages: Annotated[list[Message], append] = Field(default_factory=list)
    events: Annotated[list[Event], append] = Field(default_factory=list)
    final_status: str = "pending"   # last_write_wins is fine here

The failure mode without append is silent and easy to misdiagnose: the final state shows only the last node's contribution to the list, with no error. Common "why is my accumulator empty?" question. merge is the equivalent for dict[str, V] fields that accumulate keys across nodes.

Branch on `Response.finish_reason` before reading `message.content`¶

After await provider.complete(messages, tools=[...]) returns, the shape of Response varies by finish_reason:

finish_reason == "stop": assistant produced a content response. message.content carries the text; message.tool_calls is empty.
finish_reason == "tool_calls": assistant emitted tool calls. message.tool_calls carries the list; message.content is typically empty (model didn't say anything beyond the tool calls).
finish_reason == "length" / "content_filter" / "error": completion was cut off or refused; message.content may be partial or empty.

Post-LLM logic that reads message.content without checking finish_reason misses the entire tool-calling path:

response = await provider.complete(messages, tools=tools)

if response.finish_reason == "tool_calls":
    # Dispatch each tool call, append ToolMessage responses, re-call complete()
    for tc in response.message.tool_calls:
        result = dispatch_tool(tc.name, tc.arguments)
        messages.append(ToolMessage(content=result, tool_call_id=tc.id))
    response = await provider.complete(messages, tools=tools)
elif response.finish_reason == "stop":
    handle_text(response.message.content)
else:
    handle_error_or_partial(response)

The discriminator is one branch; missing it gives you empty data on tool-call responses and silently wrong behavior on truncations.

`disable_provider_payload` defaults to `True`: flip it for LLM-aware observability backends¶

The OTelObserver (and any spec-conformant observer reading LLM events) defaults disable_provider_payload: bool = True per spec §5.5's "default-off by privacy" framing. Without flipping the flag, LLM spans carry GenAI semconv attributes (token counts, model name, finish reason) but NOT the message payload (input messages, response content, request extras).

That's the right default for general OpenArmature use: payloads may contain PII the user hasn't audited, and storage cost grows with prompt size. But it's the WRONG default if you're wiring up an LLM-aware observability backend (Langfuse, Phoenix, Honeycomb's LLM lens) that renders the message stream as part of its generation view. Backends will show "empty" generations and you'll wonder why.

Flip the flag once at observer construction:

from openarmature.observability import OTelObserver

observer = OTelObserver(
    span_processor=your_exporter,
    disable_provider_payload=False,   # opt in to message-payload attributes
)
graph.attach_observer(observer)

The companion disable_genai_semconv flag defaults to False: GenAI semconv attributes emit by default since they're how LLM-aware backends render anything at all. Don't flip that one unless you're routing GenAI emission through a different layer.

Use the bundled `FilesystemCheckpointer` or `SQLiteCheckpointer`, not a hand-rolled serializer¶

The temptation when persisting graph state is to json.dumps(state.model_dump()) and write to a file. Don't. The shipped Checkpointer backends handle every contract openarmature.checkpoint.Checkpointer defines: round-trip integrity, parent_states for inner-save resume, fan-out progress tracking, schema-version migration, listing by correlation_id, CheckpointRecordInvalid on shape drift. A hand-rolled serializer that "works" on the happy path silently fails the moment a fan-out crash leaves an in-flight save record, and you'll be debugging it for hours before realizing the bundled backend exists.

If your storage requirement isn't local disk (FilesystemCheckpointer) or local SQLite (SQLiteCheckpointer, which also supports :memory: and arbitrary file paths), implement the Checkpointer Protocol against your backend rather than wrapping state serialization yourself. Custom backends inherit the spec's correctness contract for free.

A common shape is "after this LLM call, route to either a JSON-extraction node or a tool-dispatch node depending on finish_reason." The naive solution is two conditional edges from the LLM node, one to each downstream. That works for two branches; it scales poorly past three.

When the branches operate on different sub-shapes of state (e.g., one path is "extract JSON, then validate" while another is "dispatch tools, loop until done, then summarize"), encapsulate each as a SubgraphNode and route from the LLM node to the right subgraph. Each subgraph has its own state schema (projected from the parent), its own entry node, and its own internal topology. The parent graph becomes a switchboard with a few edges; the complexity lives one layer down where it composes cleanly.

`OpenAIProvider.ready()` exercises `chat/completions` by default; opt back into the catalog-only probe for cost-sensitive callers¶

OpenAIProvider(..., readiness_probe=...) accepts "chat_completions" (default), "models", or "both". The default issues POST /v1/chat/completions with a max_tokens=1 body so a green ready() actually proves the inference wire path works, not just that the catalog endpoint answers. The motivating failure class: OpenAI-compatible proxies (Bifrost is the field-reported case) that return 200 on GET /v1/models while 405'ing the completions endpoint, so the previous catalog-only default reported ready and every real call broke. The "models" opt-in is the old behavior, useful for cost-sensitive cloud callers where every ready() would otherwise bill one prompt's worth of tokens. "both" runs catalog then chat, giving the strongest signal at double the cost. Non-200 responses on either probe route through classify_http_error, so the canonical error categories (ProviderAuthentication, ProviderUnavailable, ProviderInvalidModel, etc.) surface consistently regardless of which probe ran.

Be explicit with `tool_choice`; don't trust the provider's default¶

Provider.complete(messages, tools, tool_choice=...) accepts "auto", "required", "none", or a ForceTool(name=...) record. When you omit tool_choice, the OpenAI provider's own default applies (usually "auto" when tools is non-empty, but documented per-provider). A pipeline that wants deterministic tool-calling (a routing node that MUST produce a tool call, a guarded LLM call that MUST NOT call tools) should pin tool_choice explicitly rather than relying on the provider default.

Pre-send validation catches the three §5 failure modes (required with empty tools, ForceTool with empty tools, ForceTool.name not in tools) and raises ProviderInvalidRequest before the HTTP call. Not all providers honor tool_choice (confirm with your provider's docs), but the OpenAI-compatible mapping is in OpenAIProvider.

Always `await graph.drain()` in short-lived processes; supply a `timeout` if observers might hang¶

CompiledGraph.invoke() returns when the graph reaches END or raises; observer events are dispatched onto a per-invocation queue and delivered by a background worker. The graph's execution loop never awaits observer processing. In a long-running service this is invisible: the worker drains naturally. In a CLI, script, or serverless function, the process exits before the worker finishes, and any late observer events (typically the last node's completed event plus any checkpoint_saved events) get dropped.

Always call await graph.drain() before the short-lived process exits. If your observer set includes anything that might hang (a metrics observer with a flaky network endpoint, an OTel exporter behind a slow OTLP collector), supply a timeout:

summary = await graph.drain(timeout=5.0)
if summary.timeout_reached:
    log.warning("drain truncated: %d events undelivered", summary.undelivered_count)

The compiled graph stays usable for subsequent invocations after a timed-out drain: workers are cancelled cleanly, no partial state leaks.

`install_log_bridge` skips its own handler when the application already attached one to the same `LoggerProvider`¶

Two distinct classes both named LoggingHandler exist in the OTel Python ecosystem and both bridge stdlib log records to the OTel Logs SDK:

opentelemetry.sdk._logs.LoggingHandler (the SDK class). Typically attached by an application's own logging setup, e.g., a FastAPI setup_logging(...) step that wires up an OTLP-backed LoggerProvider for log export.
opentelemetry.instrumentation.logging.handler.LoggingHandler (the instrumentation class). What openarmature.observability.otel.install_log_bridge attaches when it runs.

Different classes, same OTel-Logs export path. If both are attached against the same LoggerProvider, every stdlib log record fires through both handlers, both call provider.get_logger(...).emit(...), and BatchLogRecordProcessor ships the record TWICE to the OTLP endpoint. The duplication is OTLP-only; a console handler attached separately is unaffected, which makes "OTLP rows are doubled, console isn't" a head-scratcher to diagnose.

install_log_bridge detects either handler class against the same provider and skips its own addHandler accordingly; the openarmature.correlation_id LogRecord factory still installs. The check is provider-scoped, so an application that intentionally attaches a handler against a DIFFERENT LoggerProvider (a separate logs pipeline) still gets the OA bridge against the OA provider; the helper only dedups when the SAME provider would receive duplicate emissions.

Three exception hierarchies; know which one your code catches¶

openarmature exceptions split across three sibling hierarchies:

RuntimeGraphError (in openarmature.graph): node execution failures: NodeException, RoutingError, EdgeException, ReducerError, StateValidationError. Each has a category string matching the spec's canonical error categories.
CheckpointError (in openarmature.checkpoint): persistence failures: CheckpointNotFound, CheckpointSaveFailed, CheckpointRecordInvalid, CheckpointStateMigrationMissing, CheckpointStateMigrationFailed, CheckpointStateMigrationChainAmbiguous.
LlmProviderError (in openarmature.llm): provider call failures: ProviderAuthentication, ProviderInvalidRequest, ProviderInvalidResponse, ProviderInvalidModel, ProviderModelNotLoaded, ProviderRateLimit, ProviderUnavailable, ProviderUnsupportedContentBlock, StructuredOutputInvalid.

Catching Exception works but is too broad; catching one hierarchy misses the other two. If you want to branch on category strings (e.g., for retry logic), catch the relevant base: RuntimeGraphError covers all five spec runtime categories, LlmProviderError covers all nine provider categories, CheckpointError covers all six checkpoint categories. The TRANSIENT_CATEGORIES frozenset in openarmature.llm enumerates which provider categories are retriable.

Filter `openarmature.*`-namespaced events when your observer only cares about user nodes¶

OA emits observer events under sentinel node-names for some internal dispatch: openarmature.checkpoint.migrate for state-migration runs (proposal 0014) and openarmature.checkpoint.save for checkpoint saves (proposal 0010) ride on NodeEvent with a sentinel namespace. (LLM provider calls used to follow the same pattern but moved to typed LlmCompletionEvent / LlmFailedEvent variants in v0.13.0 per proposals 0049 + 0058 — those are filtered by isinstance instead.) The sentinel-namespace events let the OTel / Langfuse observers emit checkpoint-migrate spans, etc., but a custom observer that only cares about user-defined node activity sees them as noise:

async def __call__(self, event: NodeEvent) -> None:
    # Skip OA-internal events; only react to user node activity.
    if event.namespace and event.namespace[0].startswith("openarmature."):
        return
    # … user-node handling

event.namespace[0] is the safest discriminator. Don't try to filter on current_invocation_id() is None: OA-internal events are dispatched within the same invocation context as user-node events, so invocation_id is set for both; the namespace-prefix check is the stable contract.

Fan-out subgraphs that emit `list[X]` per instance produce `list[list[X]]` at `target_field`¶

When a fan-out's per-instance state collects a list[X] as its collect_field (e.g., each instance produces 0..N records), the engine's contribution step is [s[cfg.collect_field] for s in successes]; every instance's value becomes one element of the outer list. With list[X] per-instance, the parent receives list[list[X]], and the default append reducer on the parent's Annotated[list[X], append] field preserves the nesting verbatim. Pydantic then fails to validate each list[X] element against X:

attributed_candidates.0  Input should be a valid dictionary or
  instance of ClaimCandidate [input_value=[ClaimCandidate(...)],
  input_type=list]

The fix is the concat_flatten built-in reducer (proposal 0036), the list-of-lists analog of append. Declare it on the parent's collection field:

from typing import Annotated

from pydantic import Field

from openarmature.graph import State, concat_flatten

class PipelineState(State):
    attributed_candidates: Annotated[list[ClaimCandidate], concat_flatten] = Field(default_factory=list)

concat_flatten folds the per-instance lists into one flat list ([*prior, *(item for sublist in update for item in sublist)]), strict like append: it raises ReducerError if any element of the update isn't itself a list.

The dict-shaped analog is merge_all (also proposal 0036): when each fan-out instance contributes a dict[str, X], the parent's target_field receives list[dict], which plain merge can't consume. merge_all folds the sequence of mappings into the prior with shallow last-write-wins per key:

from typing import Annotated

from pydantic import Field

from openarmature.graph import State, merge_all

class PipelineState(State):
    keyed_results: Annotated[dict[str, Result], merge_all] = Field(default_factory=dict)

Single-record-per-instance fan-outs (collect_field: str, parent field Annotated[list[X], append]) don't hit this; the engine still wraps each instance's value as one element, but append flattens it correctly since each element is already an X. The two non-flat shapes emerge only when the per-instance value is itself a container: a list[X] per instance lands list[list[X]] (use concat_flatten), and a dict[str, X] per instance lands list[dict] (use merge_all).

If a parent field is populated by BOTH direct node writes AND fan-out collection, that's an architectural ambiguity worth fixing upstream: split into two fields, or pick one path.

Non obvious shapes