Skip to content

openarmature.llm

openarmature.llm: LLM provider abstraction.

Public surface: typed Message / Tool / Response, the Provider Protocol, the canonical error categories, and an OpenAI-compatible provider. Users write::

from openarmature.llm import (
    AssistantMessage,
    OpenAIProvider,
    Provider,
    SystemMessage,
    Tool,
    ToolCall,
    UserMessage,
)

All seven error categories and the canonical TRANSIENT_CATEGORIES frozenset are also re-exported here so callers writing custom retry classifiers don't have to reach into openarmature.llm.errors.

LlmProviderError

Bases: Exception

Base for all llm-provider errors. Each subclass carries a category class attribute matching one of the canonical category strings above.

Provider-originated errors SHOULD preserve the underlying provider exception as __cause__ so callers can reach the wire-level detail when needed.

ProviderAuthentication

Bases: LlmProviderError

Auth failed: invalid key, expired token, missing credentials.

ProviderInvalidModel

Bases: LlmProviderError

The bound model does not exist on this provider. Terminal: retry will not succeed without changing the bound model.

ProviderInvalidRequest

Bases: LlmProviderError

The request was malformed before sending (per-role message constraints violated, tool_call_id does not match an earlier assistant tool call, duplicate tool names, etc.). Raised by the implementation's pre-send validation, not by the provider.

ProviderInvalidResponse

Bases: LlmProviderError

Provider returned a malformed response that cannot be parsed into the expected :class:Response shape (missing required fields, invalid tool_calls structure, invalid JSON).

ProviderModelNotLoaded

Bases: LlmProviderError

The bound model is known to the provider but is not currently serving (e.g., a local vLLM/LM Studio/llama.cpp server has the model configured but not loaded). Distinct from provider_invalid_model because retry MAY succeed once loading completes.

ProviderRateLimit

ProviderRateLimit(
    *args: Any, retry_after: float | None = None
)

Bases: LlmProviderError

Provider returned a rate-limit response (HTTP 429 or equivalent).

When the provider supplies a Retry-After header (or its equivalent), the parsed seconds-to-wait surfaces on :attr:retry_after. None if the provider didn't include one.

ProviderUnavailable

Bases: LlmProviderError

Provider is unreachable: network failure, 5xx error, DNS, timeout.

ProviderUnsupportedContentBlock

ProviderUnsupportedContentBlock(
    *args: Any,
    block_type: str | None = None,
    reason: str | None = None
)

Bases: LlmProviderError

Raised when the bound model does not support a content block type used in the request.

Examples: a text-only model received an image block, or the model supports images but not the requested media_type or source variant.

Attributes:

Name Type Description
block_type str | None

The block type that was rejected (e.g., "image"), when the provider's response makes this identifiable.

reason str | None

The provider's human-readable description of the rejection, when available.

StructuredOutputInvalid

StructuredOutputInvalid(
    *args: Any,
    response_schema: dict[str, Any],
    raw_content: str,
    failure_description: str
)

Bases: LlmProviderError

Raised when a complete() call requested a response_schema and the provider's content could not be parsed as JSON or did not validate against the schema.

Attributes:

Name Type Description
response_schema dict[str, Any]

The JSON Schema requested.

raw_content str

The raw response content the model produced.

failure_description str

A description of the parse or validation failure.

AssistantMessage

Bases: _MessageBase

Assistant messages MAY carry tool_calls. If tool_calls is present and non-empty, content MAY be empty (the assistant is purely calling tools); otherwise content MUST be a non-empty string. tool_call_id MUST be absent.

ForceTool

Bases: BaseModel

Force the model to call exactly the named tool.

Use the record form of the tool_choice discriminated union when you need the model to call a specific tool by name. type is the discriminator ("tool"); the wire mapping renames it to "function" for the OpenAI body. The name MUST match a Tool.name in the supplied tools list; validate_tool_choice enforces this at pre-send time and raises ProviderInvalidRequest on violation.

ImageBlock

Bases: BaseModel

Image content block. Carries one source (URL or inline base64), a conditional media_type (required for inline sources; ignored for URL sources), and an optional detail hint.

The class-level default of detail=None preserves the omit-by-default wire behavior: providers apply their own conceptual default ("auto") when detail is absent from the wire payload. To force the wire to carry an explicit "auto", set detail="auto" on the block.

Attributes:

Name Type Description
type Literal['image']

The discriminator literal "image".

source ImageSource

One of ImageSourceURL or ImageSourceInline.

media_type str | None

IANA media type. Required when source is inline. Permitted but redundant when source is a URL (the URL payload carries the content-type); the OpenAI wire path currently does not surface it for URL sources, but provider implementations MAY consume it as a hint. Providers MUST accept image/png, image/jpeg, image/webp at minimum and MAY accept additional image/* types they document support for.

detail Literal['auto', 'low', 'high'] | None

Image-processing fidelity hint. One of "auto", "low", "high". None (the default) omits the field from the wire.

ImageSourceInline

Bases: BaseModel

Inline base64-encoded image source. The framework does not inspect, transcode, or re-encode the bytes; the parent ImageBlock MUST carry a media_type for inline sources.

Attributes:

Name Type Description
type Literal['inline']

The discriminator literal "inline".

base64_data str

The base64-encoded image bytes.

ImageSourceURL

Bases: BaseModel

URL-referenced image source. The URL is passed to the provider unchanged; the framework does not fetch, cache, or transform it.

Attributes:

Name Type Description
type Literal['url']

The discriminator literal "url".

url str

The image URL. MAY be http(s)://, data: (RFC 2397 inline data URI), or another scheme the provider documents support for.

SystemMessage

Bases: _MessageBase

System messages have non-empty content; no tool_calls; no tool_call_id.

TextBlock

Bases: BaseModel

Text content block. The content-array equivalent of a plain text-string user message; a user message with exactly one TextBlock(text=T) is normatively equivalent to one with content=T.

Attributes:

Name Type Description
type Literal['text']

The discriminator literal "text".

text str

A non-empty string.

Tool

Bases: BaseModel

A function the model may request the user execute.

parameters is a JSON Schema (object schema) describing the argument record. Kept as a plain dict[str, Any] rather than a typed schema class so the "JSON Schema, not language-native types" intent surfaces directly; implementations may offer ergonomic constructors that compile from native types (Pydantic model_json_schema()) but the surface is JSON Schema.

ToolCall

Bases: BaseModel

An assistant's request to invoke a named tool.

id is an opaque correlator within a single message list. Implementations MUST preserve provider-supplied ids verbatim; neither rewriting nor normalizing.

ToolMessage

Bases: _MessageBase

Tool messages carry the textual result of a tool call. tool_call_id MUST be present and match the id of an earlier assistant ToolCall in the same message list. The list-level matching is checked at the complete() boundary by :func:provider.validate_message_list, not at construction.

UserMessage

Bases: _MessageBase

User messages carry content as either a non-empty text string or a non-empty ordered sequence of content blocks (text and/or image). No tool_calls; no tool_call_id.

Provider

Bases: Protocol

The shape of any llm-provider implementation.

Implementations are bound to a single model identifier; switching models means constructing a new provider, not passing a different argument per call.

ready async

ready() -> None

Verify the bound model is reachable and serving.

complete async

complete(
    messages: Sequence[Message],
    tools: Sequence[Tool] | None = None,
    config: RuntimeConfig | None = None,
    response_schema: (
        dict[str, Any] | type[BaseModel] | None
    ) = None,
    tool_choice: ToolChoice | None = None,
    retry: RetryConfig | None = None,
) -> Response

Perform a single completion call.

Returns a :class:Response carrying the assistant message, finish reason, usage, and raw payload. When response_schema is supplied and the model returns structured content, Response.parsed carries the validated value.

Parameters:

Name Type Description Default
messages Sequence[Message]

The conversation to send. MUST NOT be mutated by the implementation.

required
tools Sequence[Tool] | None

Optional tool definitions the model may call.

None
config RuntimeConfig | None

Optional per-call sampling parameters.

None
response_schema dict[str, Any] | type[BaseModel] | None

Optional JSON Schema (dict) or Pydantic model class describing the expected output shape. When supplied, the implementation constrains the model's output to the schema and populates Response.parsed with the validated value.

None
tool_choice ToolChoice | None

Optional tool-choice constraint. One of "auto", "required", "none", or a :class:ForceTool record. When None (the default) the wire tool_choice field is omitted and the provider's own default applies. Pre-send validation routes through provider_invalid_request.

None
retry RetryConfig | None

Optional call-level retry configuration. When supplied, transient provider errors are retried in-call per the config; the request is built and validated once, and exactly one observability event fires for the terminal outcome. None (the default) performs a single attempt.

None

OpenAIProvider

OpenAIProvider(
    *,
    base_url: str,
    model: str,
    api_key: str | None = None,
    transport: AsyncBaseTransport | None = None,
    timeout: float = 60.0,
    force_prompt_augmentation_fallback: bool = False,
    genai_system: str = "openai",
    readiness_probe: Literal[
        "models", "chat_completions", "both"
    ] = "chat_completions",
    populate_caller_metadata: bool = True
)

OpenAI Chat Completions wire-compatible provider.

Construct with a base URL, model identifier, and optional API key + transport (an :class:httpx.AsyncBaseTransport). The transport parameter is the test seam; httpx.MockTransport drives the conformance fixtures by intercepting HTTP calls and returning canned responses, exercising the same wire-mapping code production traffic would.

base_url shape. Pass the host root only — e.g. "https://api.openai.com" or "http://localhost:8000". The provider appends /v1/chat/completions and /v1/models itself. A trailing /v1 on base_url raises ValueError: httpx joins paths by appending, so an unprefixed base_url suffix would produce a doubled /v1/v1/... wire path that silently 404/405s on most backends (some — like Bifrost — return 200 for GET /v1/v1/models while rejecting POST /v1/v1/chat/completions, leaving the readiness probe green and every completion broken). Trailing slashes are stripped; other non-empty paths (proxy prefixes like /api/openai-proxy) are left intact for intentional proxy setups.

uses_prompt_augmentation_fallback property

uses_prompt_augmentation_fallback: bool

Whether complete(response_schema=...) builds the wire body via prompt augmentation (True) or the native response_format path (False).

aclose async

aclose() -> None

Close the underlying HTTP client. Optional; async clients garbage-collect cleanly, but explicit close is RECOMMENDED in long-lived services to release the connection pool promptly.

ready async

ready() -> None

Verify the bound model is reachable. Dispatches on the readiness_probe mode chosen at construction:

  • "chat_completions" (default) issues a max_tokens=1 chat call against POST /v1/chat/completions.
  • "models" issues GET /v1/models and matches self.model against the returned data[].id entries.
  • "both" runs the catalog probe first (cheaper, surfaces model-not-in-catalog with the catalog diagnostic), then the chat probe.

complete async

complete(
    messages: Sequence[Message],
    tools: Sequence[Tool] | None = None,
    config: RuntimeConfig | None = None,
    response_schema: (
        dict[str, Any] | type[BaseModel] | None
    ) = None,
    tool_choice: ToolChoice | None = None,
    retry: RetryConfig | None = None,
) -> Response

Single completion call.

Pre-send validation runs first (per-message Pydantic + list-level invariants + response_schema shape check + tool_choice validation). HTTP errors map to canonical provider-error categories. The successful 200 body is parsed into a :class:Response; failure to parse raises provider_invalid_response; failure to validate the response content against response_schema raises structured_output_invalid.

When response_schema is supplied as a Pydantic BaseModel subclass, Response.parsed is a validated instance of that class; when supplied as a JSON Schema dict, Response.parsed is the deserialized dict.

tool_choice is validated against tools: "required" and the ForceTool record both demand non-empty tools, and ForceTool.name must appear in the supplied list. Violations raise provider_invalid_request BEFORE any HTTP request is sent.

When retry is supplied, the wire call is retried on transient provider errors per the config's classifier and backoff (defaulting to the canonical transient categories with exponential jittered backoff). The request is built and validated once; pre-send validation errors are never retried. Exactly one terminal observability event (LlmCompletionEvent or LlmFailedEvent) fires for the call's terminal outcome regardless of attempt count, and its latency_ms covers the whole call, retries and backoff included. A per-attempt LlmRetryAttemptEvent also fires for each attempt (driving the per-attempt LLM span surface), each carrying just that attempt's latency. The on_retry hook is not exception-isolated (mirroring RetryMiddleware); an exception raised by it propagates out of the call.

Response

Bases: BaseModel

The result of a Provider.complete() call.

Attributes:

Name Type Description
message AssistantMessage

The assistant message returned by the model. Always role: "assistant". May carry tool_calls.

finish_reason FinishReason

One of "stop", "length", "tool_calls", "content_filter", "error".

usage Usage

The token record (all None if the provider didn't report usage).

raw dict[str, Any]

The parsed provider response, populated on every successful return. Carries everything the provider returned; the normalized fields above are derived from it.

parsed ParsedValue

The parsed-and-validated structured value when the call supplied a response_schema and the model returned structured content. None otherwise. The runtime type depends on the schema form the caller passed: dict for a JSON-Schema dict input, a BaseModel instance for a Pydantic class input.

RuntimeConfig

Bases: BaseModel

Per-call sampling parameters and budget hints.

from_partial classmethod

from_partial(**kwargs: Any) -> RuntimeConfig

Construct a config, dropping kwargs whose value is None.

RuntimeConfig.from_partial(temperature=0.7, top_p=None).top_p is None True

Usage

Bases: BaseModel

Token-accounting record.

Each field is a non-negative integer or None. If the provider does not report token counts, prompt_tokens / completion_tokens / total_tokens MUST be None.

strict_mode_supported

strict_mode_supported(schema: dict[str, Any]) -> bool

Whether a JSON Schema satisfies the strict-mode constraints used by native-decoding LLM wire paths.

Returns True iff for every nested (sub)schema in the tree additionalProperties is explicitly false (an omitted key counts as non-strict, since JSON Schema's default is to permit extras) and every key in properties appears in required. False on any violation, on an unresolvable $ref, or on an unknown shape.

Parameters:

Name Type Description Default
schema dict[str, Any]

The root JSON Schema dict.

required

Returns:

Type Description
bool

True if the schema cleanly supports strict mode; False

bool

otherwise.

validate_message_list

validate_message_list(messages: Sequence[Message]) -> None

Validate list-level invariants.

Per-message constraints (system/user need non-empty content, assistant content-or-tool_calls, etc.) are enforced by Pydantic on the per-role Message classes at construction time. This function adds the list-level invariants Pydantic-on-Message can't see.

Raises :class:ProviderInvalidRequest on the first violation.

validate_response_schema

validate_response_schema(schema: object) -> None

Pre-send validation for a JSON Schema passed as the response_schema argument to complete().

Raises :class:ProviderInvalidRequest if the schema is not a dict, does not declare a top-level object type, or is not a valid JSON Schema document.

validate_tool_choice

validate_tool_choice(
    tool_choice: ToolChoice | None,
    tools: Sequence[Tool] | None,
) -> None

Validate tool_choice against tools.

Raises :class:ProviderInvalidRequest (the provider_invalid_request category) on:

  • tool_choice supplied as a string that is not one of "auto" / "required" / "none" (runtime defense against untyped callers; the Literal alias catches well-typed ones at type-check time).
  • tool_choice="required" supplied with empty / absent tools.
  • tool_choice=ForceTool(name=X) supplied with empty / absent tools.
  • tool_choice=ForceTool(name=X) supplied with X not in the supplied tools list.

No-op when tool_choice is None (the default — the wire field is omitted and the provider's own default applies). tool_choice="auto" and tool_choice="none" have no tools-related preconditions.

validate_tools

validate_tools(tools: Sequence[Tool] | None) -> None

Validate tool-list invariants. Tool names MUST be unique within a single complete() call.

classify_http_error

classify_http_error(resp: Response) -> LlmProviderError

Map a non-200 httpx.Response from an OpenAI-shape API to the right canonical error category.

Returns the exception (does not raise) so the caller can raise with consistent traceback context.

Reusable by third-party Provider implementations targeting any OpenAI-compatible endpoint (vLLM, LM Studio, llama.cpp server, etc.); the wire shape is stable across these and the helper saves implementers from reimplementing the mapping table.

parse_retry_after

parse_retry_after(value: str | None) -> float | None

Parse a Retry-After header value to a float seconds count.

HTTP allows seconds-int OR HTTP-date; this implementation handles the seconds-int form (the OpenAI/vendor norm) and ignores HTTP-date.

Reusable by third-party Provider implementations that need to surface Retry-After to ProviderRateLimit.retry_after.