openarmature.llm¶

openarmature.llm: LLM provider abstraction.

Public surface: typed Message / Tool / Response, the Provider Protocol, the canonical error categories, and an OpenAI-compatible provider. Users write::

from openarmature.llm import (
    AssistantMessage,
    OpenAIProvider,
    Provider,
    SystemMessage,
    Tool,
    ToolCall,
    UserMessage,
)

All seven error categories and the canonical TRANSIENT_CATEGORIES frozenset are also re-exported here so callers writing custom retry classifiers don't have to reach into openarmature.llm.errors.

LlmProviderError ¶

Bases: Exception

Base for all llm-provider errors. Each subclass carries a category class attribute matching one of the canonical category strings above.

Provider-originated errors SHOULD preserve the underlying provider exception as __cause__ so callers can reach the wire-level detail when needed.

ProviderAuthentication ¶

Bases: LlmProviderError

Auth failed: invalid key, expired token, missing credentials.

ProviderInvalidModel ¶

Bases: LlmProviderError

The bound model does not exist on this provider. Terminal: retry will not succeed without changing the bound model.

ProviderInvalidRequest ¶

Bases: LlmProviderError

The request was malformed before sending (per-role message constraints violated, tool_call_id does not match an earlier assistant tool call, duplicate tool names, etc.). Raised by the implementation's pre-send validation, not by the provider.

ProviderInvalidResponse ¶

Bases: LlmProviderError

Provider returned a malformed response that cannot be parsed into the expected :class:Response shape (missing required fields, invalid tool_calls structure, invalid JSON).

ProviderModelNotLoaded ¶

Bases: LlmProviderError

The bound model is known to the provider but is not currently serving (e.g., a local vLLM/LM Studio/llama.cpp server has the model configured but not loaded). Distinct from provider_invalid_model because retry MAY succeed once loading completes.

ProviderRateLimit ¶

ProviderRateLimit(
    *args: Any, retry_after: float | None = None
)

Bases: LlmProviderError

Provider returned a rate-limit response (HTTP 429 or equivalent).

When the provider supplies a Retry-After header (or its equivalent), the parsed seconds-to-wait surfaces on :attr:retry_after. None if the provider didn't include one.

ProviderUnavailable ¶

Bases: LlmProviderError

Provider is unreachable: network failure, 5xx error, DNS, timeout.

ProviderUnsupportedContentBlock ¶

ProviderUnsupportedContentBlock(
    *args: Any,
    block_type: str | None = None,
    reason: str | None = None
)

Bases: LlmProviderError

Raised when the bound model does not support a content block type used in the request.

Examples: a text-only model received an image block, or the model supports images but not the requested media_type or source variant.

Attributes:

Name	Type	Description
`block_type`	`str \| None`	The block type that was rejected (e.g., `"image"`), when the provider's response makes this identifiable.
`reason`	`str \| None`	The provider's human-readable description of the rejection, when available.

StructuredOutputInvalid ¶

StructuredOutputInvalid(
    *args: Any,
    response_schema: dict[str, Any],
    raw_content: str,
    failure_description: str
)

Bases: LlmProviderError

Raised when a complete() call requested a response_schema and the provider's content could not be parsed as JSON or did not validate against the schema.

Attributes:

Name	Type	Description
`response_schema`	`dict[str, Any]`	The JSON Schema requested.
`raw_content`	`str`	The raw response content the model produced.
`failure_description`	`str`	A description of the parse or validation failure.

AssistantMessage ¶

Bases: _MessageBase

Assistant messages MAY carry tool_calls. If tool_calls is present and non-empty, content MAY be empty (the assistant is purely calling tools); otherwise content MUST be a non-empty string. tool_call_id MUST be absent.

ForceTool ¶

Bases: BaseModel

Force the model to call exactly the named tool.

Use the record form of the tool_choice discriminated union when you need the model to call a specific tool by name. type is the discriminator ("tool"); the wire mapping renames it to "function" for the OpenAI body. The name MUST match a Tool.name in the supplied tools list; validate_tool_choice enforces this at pre-send time and raises ProviderInvalidRequest on violation.

ImageBlock ¶

Bases: BaseModel

Image content block. Carries one source (URL or inline base64), a conditional media_type (required for inline sources; ignored for URL sources), and an optional detail hint.

The class-level default of detail=None preserves the omit-by-default wire behavior: providers apply their own conceptual default ("auto") when detail is absent from the wire payload. To force the wire to carry an explicit "auto", set detail="auto" on the block.

Attributes:

Name	Type	Description
`type`	`Literal['image']`	The discriminator literal `"image"`.
`source`	`ImageSource`	One of `ImageSourceURL` or `ImageSourceInline`.
`media_type`	`str \| None`	IANA media type. Required when source is inline. Permitted but redundant when source is a URL (the URL payload carries the content-type); the OpenAI wire path currently does not surface it for URL sources, but provider implementations MAY consume it as a hint. Providers MUST accept `image/png`, `image/jpeg`, `image/webp` at minimum and MAY accept additional `image/*` types they document support for.
`detail`	`Literal['auto', 'low', 'high'] \| None`	Image-processing fidelity hint. One of `"auto"`, `"low"`, `"high"`. `None` (the default) omits the field from the wire.

ImageSourceInline ¶

Bases: BaseModel

Inline base64-encoded image source. The framework does not inspect, transcode, or re-encode the bytes; the parent ImageBlock MUST carry a media_type for inline sources.

Attributes:

Name	Type	Description
`type`	`Literal['inline']`	The discriminator literal `"inline"`.
`base64_data`	`str`	The base64-encoded image bytes.

ImageSourceURL ¶

Bases: BaseModel

URL-referenced image source. The URL is passed to the provider unchanged; the framework does not fetch, cache, or transform it.

Attributes:

Name	Type	Description
`type`	`Literal['url']`	The discriminator literal `"url"`.
`url`	`str`	The image URL. MAY be `http(s)://`, `data:` (RFC 2397 inline data URI), or another scheme the provider documents support for.

SystemMessage ¶

Bases: _MessageBase

System messages have non-empty content; no tool_calls; no tool_call_id.

TextBlock ¶

Bases: BaseModel

Text content block. The content-array equivalent of a plain text-string user message; a user message with exactly one TextBlock(text=T) is normatively equivalent to one with content=T.

Attributes:

Name	Type	Description
`type`	`Literal['text']`	The discriminator literal `"text"`.
`text`	`str`	A non-empty string.

Tool ¶

Bases: BaseModel

A function the model may request the user execute.

parameters is a JSON Schema (object schema) describing the argument record. Kept as a plain dict[str, Any] rather than a typed schema class so the "JSON Schema, not language-native types" intent surfaces directly; implementations may offer ergonomic constructors that compile from native types (Pydantic model_json_schema()) but the surface is JSON Schema.

ToolCall ¶

Bases: BaseModel

An assistant's request to invoke a named tool.

id is an opaque correlator within a single message list. Implementations MUST preserve provider-supplied ids verbatim; neither rewriting nor normalizing.

ToolMessage ¶

Bases: _MessageBase

Tool messages carry the textual result of a tool call. tool_call_id MUST be present and match the id of an earlier assistant ToolCall in the same message list. The list-level matching is checked at the complete() boundary by :func:provider.validate_message_list, not at construction.

UserMessage ¶

Bases: _MessageBase

User messages carry content as either a non-empty text string or a non-empty ordered sequence of content blocks (text and/or image). No tool_calls; no tool_call_id.

Provider ¶

Bases: Protocol

The shape of any llm-provider implementation.

Implementations are bound to a single model identifier; switching models means constructing a new provider, not passing a different argument per call.

ready `async` ¶

ready() -> None

Verify the bound model is reachable and serving.

complete `async` ¶

complete(
    messages: Sequence[Message],
    tools: Sequence[Tool] | None = None,
    config: RuntimeConfig | None = None,
    response_schema: (
        dict[str, Any] | type[BaseModel] | None
    ) = None,
    tool_choice: ToolChoice | None = None,
    retry: RetryConfig | None = None,
) -> Response

Perform a single completion call.

Returns a :class:Response carrying the assistant message, finish reason, usage, and raw payload. When response_schema is supplied and the model returns structured content, Response.parsed carries the validated value.

Parameters:

Name	Type	Description	Default
`messages`	`Sequence[Message]`	The conversation to send. MUST NOT be mutated by the implementation.	required
`tools`	`Sequence[Tool] \| None`	Optional tool definitions the model may call.	`None`
`config`	`RuntimeConfig \| None`	Optional per-call sampling parameters.	`None`
`response_schema`	`dict[str, Any] \| type[BaseModel] \| None`	Optional JSON Schema (dict) or Pydantic model class describing the expected output shape. When supplied, the implementation constrains the model's output to the schema and populates `Response.parsed` with the validated value.	`None`
`tool_choice`	`ToolChoice \| None`	Optional tool-choice constraint. One of `"auto"`, `"required"`, `"none"`, or a :class:`ForceTool` record. When `None` (the default) the wire `tool_choice` field is omitted and the provider's own default applies. Pre-send validation routes through `provider_invalid_request`.	`None`
`retry`	`RetryConfig \| None`	Optional call-level retry configuration. When supplied, transient provider errors are retried in-call per the config; the request is built and validated once, and exactly one observability event fires for the terminal outcome. `None` (the default) performs a single attempt.	`None`

OpenAIProvider ¶

OpenAIProvider(
    *,
    base_url: str,
    model: str,
    api_key: str | None = None,
    transport: AsyncBaseTransport | None = None,
    timeout: float = 60.0,
    force_prompt_augmentation_fallback: bool = False,
    genai_system: str = "openai",
    readiness_probe: Literal[
        "models", "chat_completions", "both"
    ] = "chat_completions",
    populate_caller_metadata: bool = True
)

OpenAI Chat Completions wire-compatible provider.

Construct with a base URL, model identifier, and optional API key + transport (an :class:httpx.AsyncBaseTransport). The transport parameter is the test seam; httpx.MockTransport drives the conformance fixtures by intercepting HTTP calls and returning canned responses, exercising the same wire-mapping code production traffic would.

base_url shape. Pass the host root only — e.g. "https://api.openai.com" or "http://localhost:8000". The provider appends /v1/chat/completions and /v1/models itself. A trailing /v1 on base_url raises ValueError: httpx joins paths by appending, so an unprefixed base_url suffix would produce a doubled /v1/v1/... wire path that silently 404/405s on most backends (some — like Bifrost — return 200 for GET /v1/v1/models while rejecting POST /v1/v1/chat/completions, leaving the readiness probe green and every completion broken). Trailing slashes are stripped; other non-empty paths (proxy prefixes like /api/openai-proxy) are left intact for intentional proxy setups.

uses_prompt_augmentation_fallback `property` ¶

uses_prompt_augmentation_fallback: bool

Whether complete(response_schema=...) builds the wire body via prompt augmentation (True) or the native response_format path (False).

aclose `async` ¶

aclose() -> None

Close the underlying HTTP client. Optional; async clients garbage-collect cleanly, but explicit close is RECOMMENDED in long-lived services to release the connection pool promptly.

ready `async` ¶

ready() -> None

Verify the bound model is reachable. Dispatches on the readiness_probe mode chosen at construction:

"chat_completions" (default) issues a max_tokens=1 chat call against POST /v1/chat/completions.
"models" issues GET /v1/models and matches self.model against the returned data[].id entries.
"both" runs the catalog probe first (cheaper, surfaces model-not-in-catalog with the catalog diagnostic), then the chat probe.

complete `async` ¶

complete(
    messages: Sequence[Message],
    tools: Sequence[Tool] | None = None,
    config: RuntimeConfig | None = None,
    response_schema: (
        dict[str, Any] | type[BaseModel] | None
    ) = None,
    tool_choice: ToolChoice | None = None,
    retry: RetryConfig | None = None,
) -> Response

Single completion call.

Pre-send validation runs first (per-message Pydantic + list-level invariants + response_schema shape check + tool_choice validation). HTTP errors map to canonical provider-error categories. The successful 200 body is parsed into a :class:Response; failure to parse raises provider_invalid_response; failure to validate the response content against response_schema raises structured_output_invalid.

When response_schema is supplied as a Pydantic BaseModel subclass, Response.parsed is a validated instance of that class; when supplied as a JSON Schema dict, Response.parsed is the deserialized dict.

tool_choice is validated against tools: "required" and the ForceTool record both demand non-empty tools, and ForceTool.name must appear in the supplied list. Violations raise provider_invalid_request BEFORE any HTTP request is sent.

When retry is supplied, the wire call is retried on transient provider errors per the config's classifier and backoff (defaulting to the canonical transient categories with exponential jittered backoff). The request is built and validated once; pre-send validation errors are never retried. Exactly one terminal observability event (LlmCompletionEvent or LlmFailedEvent) fires for the call's terminal outcome regardless of attempt count, and its latency_ms covers the whole call, retries and backoff included. A per-attempt LlmRetryAttemptEvent also fires for each attempt (driving the per-attempt LLM span surface), each carrying just that attempt's latency. The on_retry hook is not exception-isolated (mirroring RetryMiddleware); an exception raised by it propagates out of the call.

Response ¶

Bases: BaseModel

The result of a Provider.complete() call.

Attributes:

Name	Type	Description
`message`	`AssistantMessage`	The assistant message returned by the model. Always `role: "assistant"`. May carry `tool_calls`.
`finish_reason`	`FinishReason`	One of `"stop"`, `"length"`, `"tool_calls"`, `"content_filter"`, `"error"`.
`usage`	`Usage`	The token record (all `None` if the provider didn't report usage).
`raw`	`dict[str, Any]`	The parsed provider response, populated on every successful return. Carries everything the provider returned; the normalized fields above are derived from it.
`parsed`	`ParsedValue`	The parsed-and-validated structured value when the call supplied a `response_schema` and the model returned structured content. `None` otherwise. The runtime type depends on the schema form the caller passed: `dict` for a JSON-Schema dict input, a `BaseModel` instance for a Pydantic class input.

RuntimeConfig ¶

Bases: BaseModel

Per-call sampling parameters and budget hints.

from_partial `classmethod` ¶

from_partial(**kwargs: Any) -> RuntimeConfig

Construct a config, dropping kwargs whose value is None.

RuntimeConfig.from_partial(temperature=0.7, top_p=None).top_p is None True

Usage ¶

Bases: BaseModel

Token-accounting record.

Each field is a non-negative integer or None. If the provider does not report token counts, prompt_tokens / completion_tokens / total_tokens MUST be None.

strict_mode_supported ¶

strict_mode_supported(schema: dict[str, Any]) -> bool

Whether a JSON Schema satisfies the strict-mode constraints used by native-decoding LLM wire paths.

Returns True iff for every nested (sub)schema in the tree additionalProperties is explicitly false (an omitted key counts as non-strict, since JSON Schema's default is to permit extras) and every key in properties appears in required. False on any violation, on an unresolvable $ref, or on an unknown shape.

Parameters:

Name	Type	Description	Default
`schema`	`dict[str, Any]`	The root JSON Schema dict.	required

Returns:

Type	Description
`bool`	`True` if the schema cleanly supports strict mode; `False`
`bool`	otherwise.

validate_message_list ¶

validate_message_list(messages: Sequence[Message]) -> None

Validate list-level invariants.

Per-message constraints (system/user need non-empty content, assistant content-or-tool_calls, etc.) are enforced by Pydantic on the per-role Message classes at construction time. This function adds the list-level invariants Pydantic-on-Message can't see.

Raises :class:ProviderInvalidRequest on the first violation.

validate_response_schema ¶

validate_response_schema(schema: object) -> None

Pre-send validation for a JSON Schema passed as the response_schema argument to complete().

Raises :class:ProviderInvalidRequest if the schema is not a dict, does not declare a top-level object type, or is not a valid JSON Schema document.

validate_tool_choice ¶

validate_tool_choice(
    tool_choice: ToolChoice | None,
    tools: Sequence[Tool] | None,
) -> None

Validate tool_choice against tools.

Raises :class:ProviderInvalidRequest (the provider_invalid_request category) on:

tool_choice supplied as a string that is not one of "auto" / "required" / "none" (runtime defense against untyped callers; the Literal alias catches well-typed ones at type-check time).
tool_choice="required" supplied with empty / absent tools.
tool_choice=ForceTool(name=X) supplied with empty / absent tools.
tool_choice=ForceTool(name=X) supplied with X not in the supplied tools list.

No-op when tool_choice is None (the default — the wire field is omitted and the provider's own default applies). tool_choice="auto" and tool_choice="none" have no tools-related preconditions.

validate_tools ¶

validate_tools(tools: Sequence[Tool] | None) -> None

Validate tool-list invariants. Tool names MUST be unique within a single complete() call.

classify_http_error ¶

classify_http_error(resp: Response) -> LlmProviderError

Map a non-200 httpx.Response from an OpenAI-shape API to the right canonical error category.

Returns the exception (does not raise) so the caller can raise with consistent traceback context.

Reusable by third-party Provider implementations targeting any OpenAI-compatible endpoint (vLLM, LM Studio, llama.cpp server, etc.); the wire shape is stable across these and the helper saves implementers from reimplementing the mapping table.

parse_retry_after ¶

parse_retry_after(value: str | None) -> float | None

Parse a Retry-After header value to a float seconds count.

HTTP allows seconds-int OR HTTP-date; this implementation handles the seconds-int form (the OpenAI/vendor norm) and ignores HTTP-date.

Reusable by third-party Provider implementations that need to surface Retry-After to ProviderRateLimit.retry_after.

openarmature.llm¶

LlmProviderError ¶

ProviderAuthentication ¶

ProviderInvalidModel ¶

ProviderInvalidRequest ¶

ProviderInvalidResponse ¶

ProviderModelNotLoaded ¶

ProviderRateLimit ¶

ProviderUnavailable ¶

ProviderUnsupportedContentBlock ¶

StructuredOutputInvalid ¶

AssistantMessage ¶

ForceTool ¶

ImageBlock ¶

ImageSourceInline ¶

ImageSourceURL ¶

SystemMessage ¶

TextBlock ¶

Tool ¶

ToolCall ¶

ToolMessage ¶

UserMessage ¶

Provider ¶

ready async ¶

complete async ¶

OpenAIProvider ¶

uses_prompt_augmentation_fallback property ¶

aclose async ¶

ready async ¶

complete async ¶

Response ¶

RuntimeConfig ¶

from_partial classmethod ¶

Usage ¶

strict_mode_supported ¶

validate_message_list ¶

validate_response_schema ¶

validate_tool_choice ¶

validate_tools ¶

classify_http_error ¶

parse_retry_after ¶

ready `async` ¶

complete `async` ¶

uses_prompt_augmentation_fallback `property` ¶

aclose `async` ¶

ready `async` ¶

complete `async` ¶

from_partial `classmethod` ¶