openarmature.llm¶
openarmature.llm: LLM provider abstraction.
Public surface: typed Message / Tool / Response, the
Provider Protocol, the canonical error categories, and an
OpenAI-compatible provider. Users write::
from openarmature.llm import (
AssistantMessage,
OpenAIProvider,
Provider,
SystemMessage,
Tool,
ToolCall,
UserMessage,
)
All seven error categories and the canonical TRANSIENT_CATEGORIES
frozenset are also re-exported here so callers writing custom retry
classifiers don't have to reach into openarmature.llm.errors.
LlmProviderError ¶
Bases: Exception
Base for all llm-provider errors. Each subclass carries a
category class attribute matching one of the canonical
category strings above.
Provider-originated errors SHOULD preserve the underlying provider
exception as __cause__ so callers can reach the wire-level
detail when needed.
ProviderAuthentication ¶
ProviderInvalidModel ¶
Bases: LlmProviderError
The bound model does not exist on this provider. Terminal: retry will not succeed without changing the bound model.
ProviderInvalidRequest ¶
Bases: LlmProviderError
The request was malformed before sending (per-role message
constraints violated, tool_call_id does not match an earlier
assistant tool call, duplicate tool names, etc.). Raised by the
implementation's pre-send validation, not by the provider.
ProviderInvalidResponse ¶
Bases: LlmProviderError
Provider returned a malformed response that cannot be parsed
into the expected :class:Response shape (missing required
fields, invalid tool_calls structure, invalid JSON).
ProviderModelNotLoaded ¶
Bases: LlmProviderError
The bound model is known to the provider but is not currently
serving (e.g., a local vLLM/LM Studio/llama.cpp server has the
model configured but not loaded). Distinct from
provider_invalid_model because retry MAY succeed once loading
completes.
ProviderRateLimit ¶
Bases: LlmProviderError
Provider returned a rate-limit response (HTTP 429 or equivalent).
When the provider supplies a Retry-After header (or its
equivalent), the parsed seconds-to-wait surfaces on
:attr:retry_after. None if the provider didn't include one.
ProviderUnavailable ¶
ProviderUnsupportedContentBlock ¶
ProviderUnsupportedContentBlock(
*args: Any,
block_type: str | None = None,
reason: str | None = None
)
Bases: LlmProviderError
Raised when the bound model does not support a content block type used in the request.
Examples: a text-only model received an image block, or the model
supports images but not the requested media_type or source
variant.
Attributes:
| Name | Type | Description |
|---|---|---|
block_type |
str | None
|
The block type that was rejected (e.g., |
reason |
str | None
|
The provider's human-readable description of the rejection, when available. |
StructuredOutputInvalid ¶
StructuredOutputInvalid(
*args: Any,
response_schema: dict[str, Any],
raw_content: str,
failure_description: str
)
Bases: LlmProviderError
Raised when a complete() call requested a response_schema
and the provider's content could not be parsed as JSON or did not
validate against the schema.
Attributes:
| Name | Type | Description |
|---|---|---|
response_schema |
dict[str, Any]
|
The JSON Schema requested. |
raw_content |
str
|
The raw response content the model produced. |
failure_description |
str
|
A description of the parse or validation failure. |
AssistantMessage ¶
Bases: _MessageBase
Assistant messages MAY carry tool_calls. If tool_calls
is present and non-empty, content MAY be empty (the assistant
is purely calling tools); otherwise content MUST be a
non-empty string. tool_call_id MUST be absent.
ForceTool ¶
Bases: BaseModel
Force the model to call exactly the named tool.
Use the record form of the tool_choice discriminated union when
you need the model to call a specific tool by name. type is the
discriminator ("tool"); the wire mapping renames it
to "function" for the OpenAI body. The
name MUST match a Tool.name in the supplied tools
list; validate_tool_choice enforces this at pre-send time and
raises ProviderInvalidRequest on violation.
ImageBlock ¶
Bases: BaseModel
Image content block. Carries one source (URL or inline base64),
a conditional media_type (required for inline sources; ignored
for URL sources), and an optional detail hint.
The class-level default of detail=None preserves the
omit-by-default wire behavior: providers apply their own
conceptual default ("auto") when detail is absent from the
wire payload. To force the wire to carry an explicit "auto",
set detail="auto" on the block.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal['image']
|
The discriminator literal |
source |
ImageSource
|
One of |
media_type |
str | None
|
IANA media type. Required when source is inline.
Permitted but redundant when source is a URL (the URL
payload carries the content-type); the OpenAI wire path
currently does not surface it for URL sources, but
provider implementations MAY consume it as a hint.
Providers MUST accept |
detail |
Literal['auto', 'low', 'high'] | None
|
Image-processing fidelity hint. One of |
ImageSourceInline ¶
Bases: BaseModel
Inline base64-encoded image source. The framework does not
inspect, transcode, or re-encode the bytes; the parent ImageBlock
MUST carry a media_type for inline sources.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal['inline']
|
The discriminator literal |
base64_data |
str
|
The base64-encoded image bytes. |
ImageSourceURL ¶
Bases: BaseModel
URL-referenced image source. The URL is passed to the provider unchanged; the framework does not fetch, cache, or transform it.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal['url']
|
The discriminator literal |
url |
str
|
The image URL. MAY be |
SystemMessage ¶
Bases: _MessageBase
System messages have non-empty content; no tool_calls; no
tool_call_id.
TextBlock ¶
Bases: BaseModel
Text content block. The content-array equivalent of a plain
text-string user message; a user message with exactly one
TextBlock(text=T) is normatively equivalent to one with
content=T.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal['text']
|
The discriminator literal |
text |
str
|
A non-empty string. |
Tool ¶
Bases: BaseModel
A function the model may request the user execute.
parameters is a JSON Schema (object schema) describing the
argument record. Kept as a plain dict[str, Any] rather than a
typed schema class so the "JSON Schema, not language-native
types" intent surfaces directly; implementations may offer
ergonomic constructors that compile from native types (Pydantic
model_json_schema()) but the surface is JSON Schema.
ToolCall ¶
Bases: BaseModel
An assistant's request to invoke a named tool.
id is an opaque correlator within a single message list.
Implementations MUST preserve provider-supplied ids verbatim;
neither rewriting nor normalizing.
ToolMessage ¶
Bases: _MessageBase
Tool messages carry the textual result of a tool call.
tool_call_id MUST be present and match the id of an
earlier assistant ToolCall in the same message list. The
list-level matching is checked at the complete() boundary by
:func:provider.validate_message_list, not at construction.
UserMessage ¶
Bases: _MessageBase
User messages carry content as either a non-empty text string or a non-empty ordered sequence of content blocks (text and/or image). No tool_calls; no tool_call_id.
Provider ¶
Bases: Protocol
The shape of any llm-provider implementation.
Implementations are bound to a single model identifier; switching models means constructing a new provider, not passing a different argument per call.
complete
async
¶
complete(
messages: Sequence[Message],
tools: Sequence[Tool] | None = None,
config: RuntimeConfig | None = None,
response_schema: (
dict[str, Any] | type[BaseModel] | None
) = None,
tool_choice: ToolChoice | None = None,
retry: RetryConfig | None = None,
) -> Response
Perform a single completion call.
Returns a :class:Response carrying the assistant message,
finish reason, usage, and raw payload. When response_schema
is supplied and the model returns structured content,
Response.parsed carries the validated value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
Sequence[Message]
|
The conversation to send. MUST NOT be mutated by the implementation. |
required |
tools
|
Sequence[Tool] | None
|
Optional tool definitions the model may call. |
None
|
config
|
RuntimeConfig | None
|
Optional per-call sampling parameters. |
None
|
response_schema
|
dict[str, Any] | type[BaseModel] | None
|
Optional JSON Schema (dict) or Pydantic
model class describing the expected output shape. When
supplied, the implementation constrains the model's
output to the schema and populates |
None
|
tool_choice
|
ToolChoice | None
|
Optional tool-choice constraint. One of
|
None
|
retry
|
RetryConfig | None
|
Optional call-level retry configuration. When
supplied, transient provider errors are retried in-call
per the config; the request is built and validated once,
and exactly one observability event fires for the
terminal outcome. |
None
|
OpenAIProvider ¶
OpenAIProvider(
*,
base_url: str,
model: str,
api_key: str | None = None,
transport: AsyncBaseTransport | None = None,
timeout: float = 60.0,
force_prompt_augmentation_fallback: bool = False,
genai_system: str = "openai",
readiness_probe: Literal[
"models", "chat_completions", "both"
] = "chat_completions",
populate_caller_metadata: bool = True
)
OpenAI Chat Completions wire-compatible provider.
Construct with a base URL, model identifier, and optional API key
+ transport (an :class:httpx.AsyncBaseTransport). The
transport parameter is the test seam; httpx.MockTransport
drives the conformance fixtures by intercepting HTTP calls and
returning canned responses, exercising the same wire-mapping
code production traffic would.
base_url shape. Pass the host root only — e.g.
"https://api.openai.com" or "http://localhost:8000". The
provider appends /v1/chat/completions and /v1/models
itself. A trailing /v1 on base_url raises ValueError:
httpx joins paths by appending, so an unprefixed base_url
suffix would produce a doubled /v1/v1/... wire path that
silently 404/405s on most backends (some — like Bifrost — return
200 for GET /v1/v1/models while rejecting POST
/v1/v1/chat/completions, leaving the readiness probe green and
every completion broken). Trailing slashes are stripped; other
non-empty paths (proxy prefixes like /api/openai-proxy) are
left intact for intentional proxy setups.
uses_prompt_augmentation_fallback
property
¶
Whether complete(response_schema=...) builds the wire
body via prompt augmentation (True) or the native
response_format path (False).
aclose
async
¶
Close the underlying HTTP client. Optional; async clients garbage-collect cleanly, but explicit close is RECOMMENDED in long-lived services to release the connection pool promptly.
ready
async
¶
Verify the bound model is reachable. Dispatches on the
readiness_probe mode chosen at construction:
"chat_completions"(default) issues amax_tokens=1chat call againstPOST /v1/chat/completions."models"issuesGET /v1/modelsand matchesself.modelagainst the returneddata[].identries."both"runs the catalog probe first (cheaper, surfaces model-not-in-catalog with the catalog diagnostic), then the chat probe.
complete
async
¶
complete(
messages: Sequence[Message],
tools: Sequence[Tool] | None = None,
config: RuntimeConfig | None = None,
response_schema: (
dict[str, Any] | type[BaseModel] | None
) = None,
tool_choice: ToolChoice | None = None,
retry: RetryConfig | None = None,
) -> Response
Single completion call.
Pre-send validation runs first (per-message Pydantic +
list-level invariants + response_schema shape check +
tool_choice validation). HTTP errors map to canonical
provider-error categories. The successful 200 body is parsed
into a :class:Response; failure to parse raises
provider_invalid_response; failure to validate the response
content against response_schema raises
structured_output_invalid.
When response_schema is supplied as a Pydantic BaseModel
subclass, Response.parsed is a validated instance of that
class; when supplied as a JSON Schema dict,
Response.parsed is the deserialized dict.
tool_choice is validated against tools:
"required" and the ForceTool record both demand
non-empty tools, and ForceTool.name must appear in the
supplied list. Violations raise provider_invalid_request
BEFORE any HTTP request is sent.
When retry is supplied, the wire call is retried on
transient provider errors per the config's classifier and
backoff (defaulting to the canonical transient categories with
exponential jittered backoff). The request is built and
validated once; pre-send validation errors are never retried.
Exactly one terminal observability event (LlmCompletionEvent or
LlmFailedEvent) fires for the call's terminal outcome regardless
of attempt count, and its latency_ms covers the whole call,
retries and backoff included. A per-attempt LlmRetryAttemptEvent
also fires for each attempt (driving the per-attempt LLM span
surface), each carrying just that attempt's latency. The
on_retry hook is not exception-isolated (mirroring
RetryMiddleware); an exception raised by it propagates out
of the call.
Response ¶
Bases: BaseModel
The result of a Provider.complete() call.
Attributes:
| Name | Type | Description |
|---|---|---|
message |
AssistantMessage
|
The assistant message returned by the model.
Always |
finish_reason |
FinishReason
|
One of |
usage |
Usage
|
The token record (all |
raw |
dict[str, Any]
|
The parsed provider response, populated on every successful return. Carries everything the provider returned; the normalized fields above are derived from it. |
parsed |
ParsedValue
|
The parsed-and-validated structured value when the call
supplied a |
RuntimeConfig ¶
Bases: BaseModel
Per-call sampling parameters and budget hints.
from_partial
classmethod
¶
from_partial(**kwargs: Any) -> RuntimeConfig
Construct a config, dropping kwargs whose value is None.
RuntimeConfig.from_partial(temperature=0.7, top_p=None).top_p is None True
Usage ¶
Bases: BaseModel
Token-accounting record.
Each field is a non-negative integer or None. If the provider
does not report token counts, prompt_tokens / completion_tokens
/ total_tokens MUST be None.
strict_mode_supported ¶
Whether a JSON Schema satisfies the strict-mode constraints used by native-decoding LLM wire paths.
Returns True iff for every nested (sub)schema in the tree
additionalProperties is explicitly false (an omitted key
counts as non-strict, since JSON Schema's default is to permit
extras) and every key in properties appears in required.
False on any violation, on an unresolvable $ref, or on an
unknown shape.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
dict[str, Any]
|
The root JSON Schema dict. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
otherwise. |
validate_message_list ¶
Validate list-level invariants.
Per-message constraints (system/user need non-empty content, assistant content-or-tool_calls, etc.) are enforced by Pydantic on the per-role Message classes at construction time. This function adds the list-level invariants Pydantic-on-Message can't see.
Raises :class:ProviderInvalidRequest on the first violation.
validate_response_schema ¶
Pre-send validation for a JSON Schema passed as the
response_schema argument to complete().
Raises :class:ProviderInvalidRequest if the schema is not a dict,
does not declare a top-level object type, or is not a valid JSON
Schema document.
validate_tool_choice ¶
validate_tool_choice(
tool_choice: ToolChoice | None,
tools: Sequence[Tool] | None,
) -> None
Validate tool_choice against tools.
Raises :class:ProviderInvalidRequest (the
provider_invalid_request category) on:
tool_choicesupplied as a string that is not one of"auto"/"required"/"none"(runtime defense against untyped callers; the Literal alias catches well-typed ones at type-check time).tool_choice="required"supplied with empty / absenttools.tool_choice=ForceTool(name=X)supplied with empty / absenttools.tool_choice=ForceTool(name=X)supplied withXnot in the supplied tools list.
No-op when tool_choice is None (the default — the wire
field is omitted and the provider's own default applies).
tool_choice="auto" and tool_choice="none" have no
tools-related preconditions.
validate_tools ¶
validate_tools(tools: Sequence[Tool] | None) -> None
Validate tool-list invariants. Tool names MUST be unique
within a single complete() call.
classify_http_error ¶
classify_http_error(resp: Response) -> LlmProviderError
Map a non-200 httpx.Response from an OpenAI-shape API to
the right canonical error category.
Returns the exception (does not raise) so the caller can
raise with consistent traceback context.
Reusable by third-party Provider implementations targeting any OpenAI-compatible endpoint (vLLM, LM Studio, llama.cpp server, etc.); the wire shape is stable across these and the helper saves implementers from reimplementing the mapping table.
parse_retry_after ¶
Parse a Retry-After header value to a float seconds count.
HTTP allows seconds-int OR HTTP-date; this implementation handles the seconds-int form (the OpenAI/vendor norm) and ignores HTTP-date.
Reusable by third-party Provider implementations that need to
surface Retry-After to ProviderRateLimit.retry_after.