Skip to content

LLM Client

The llm_client crate is a provider-neutral HTTP client for LLM chat completions. It handles authentication, wire format differences, model profiles, and SSE streaming — on both native and WASM targets.

LlmClient sends chat requests to any OpenAI-compatible or Anthropic Messages endpoint and returns typed responses or real-time StreamEvent streams. It is pulled in automatically through the llm-engine feature of agent_sdk.

WireFormat selects the JSON shape, not a specific vendor:

VariantEndpoints
WireFormat::OpenAiCompatOpenAI, Azure OpenAI, OpenRouter, vLLM, Together, Fireworks
WireFormat::AnthropicMessagesAnthropic direct, Bedrock Claude, Vertex Claude
  1. Choose a wire formatOpenAiCompat for OpenAI-shaped APIs, AnthropicMessages for Anthropic.

  2. Set base URL and auth — point at the provider’s endpoint and supply credentials.

  3. Optionally configure API mode, streaming policy, default headers, or custom paths.

  4. Call .build() to get an LlmClient.

use llm_client::{LlmClient, WireFormat, ApiKeyAuth};
let client = LlmClient::builder(WireFormat::OpenAiCompat)
.base_url("https://api.openai.com/v1")
.auth(ApiKeyAuth::new(std::env::var("OPENAI_API_KEY")?))
.build()?;
MethodDescription
LlmClient::builder(wire_format)Start building with WireFormat::OpenAiCompat or WireFormat::AnthropicMessages
.base_url(url)Provider base URL (required)
.auth(provider)Auth provider — any type implementing AuthProvider (required)
.api_mode(mode)ApiMode::Chat, ApiMode::Responses, or ApiMode::Auto (OpenAI only)
.streaming_policy(policy)StreamingPolicy { connect_ms, first_byte_ms, idle_ms } timeout knobs
.default_headers(map)Extra headers merged into every request
.openai_paths(chat, responses)Override path segments (Azure uses /chat/completions, not /v1/chat/completions)
LlmClient::azure_openai_builder(...)Convenience shortcut that sets base URL, auth, and paths for Azure deployments
TypeHeaderUse case
ApiKeyAuthAuthorization: Bearer <key>OpenAI, OpenRouter, vLLM, Together, Fireworks
AnthropicApiKeyAuthx-api-key + anthropic-versionAnthropic direct API
AzureOpenAiAuthapi-key or Authorization: Bearer + ?api-version=Azure OpenAI (key or Entra token)
Custom impl AuthProviderAnyCustom auth schemes

AzureCredential is an enum with two variants:

  • AzureCredential::ApiKey(String) — sent as api-key header
  • AzureCredential::BearerToken(String) — sent as Authorization: Bearer header
use llm_client::{LlmRequest, ChatMessage};
let response = client.chat(LlmRequest {
model: "gpt-4o".into(),
messages: vec![ChatMessage {
role: "user".into(),
content: Some("Explain WASM in one sentence.".into()),
..Default::default()
}],
..Default::default()
}).await?;
use futures::StreamExt;
let mut stream = client.chat_stream(LlmRequest {
model: "gpt-4o".into(),
messages: vec![ChatMessage {
role: "user".into(),
content: Some("Hello".into()),
..Default::default()
}],
..Default::default()
}).await?;
while let Some(event) = stream.next().await {
match event? {
StreamEvent::ContentDelta { delta } => print!("{}", delta),
StreamEvent::Done { finish_reason, usage } => {
println!("\nDone: {:?}, usage: {:?}", finish_reason, usage);
break;
}
_ => {}
}
}

The stream emits these events in order:

VariantFieldsWhen
StreamStartid, modelFirst chunk (carries role)
ContentDeltadelta: StringEach text token
ReasoningDeltadelta: StringReasoning models (Qwen3, DeepSeek R1) — never fabricated
ToolCallStartindex, id, nameNew function call begins
ToolCallDeltaindex, arguments_deltaArguments JSON fragment
Donefinish_reason, usageGeneration complete ("stop", "tool_calls", "length")
ErrormessageStreaming error

Typical sequences:

  • Text: StreamStart → ContentDelta* → Done
  • Tool calls: StreamStart → ToolCallStart → ToolCallDelta* → Done
  • Reasoning models: StreamStart → ReasoningDelta* → ContentDelta* → Done

ModelConfig combines model identity, family-specific behavior, and capabilities:

use llm_client::profile::{ModelConfig, ModelFamily, ModelProfile, ModelCapabilities};
let config = ModelConfig {
model_id: "gpt-5".to_string(),
family: ModelFamily::Gpt5,
profile: ModelProfile::Gpt5 {
reasoning_effort: Some("high".into()),
responses_text_verbosity: None,
responses_reasoning_object: Some(true),
},
capabilities: None, // auto-resolved from registry
extensions: Default::default(),
};
let caps = config.resolve_capabilities();
assert_eq!(caps.context_window, 256_000);

ModelFamily: OpenAI, Gpt5, Qwen3, Claude, Gemini, DeepSeek, Llama, Mistral

VariantExtra fieldsPurpose
ModelProfile::Generic(none)Default for most models
ModelProfile::Gpt5reasoning_effort, responses_text_verbosity, responses_reasoning_objectGPT-5 specific knobs
ModelProfile::Qwen3enable_thinking, tool_call_parser, reasoning_parser, auto_tool_choice, template_kwargsQwen3 / vLLM tuning

ModelCapabilities::lookup("gpt-4o") resolves from a built-in registry of known models. For unknown models, it returns conservative defaults (8K context window). Override by setting capabilities on ModelConfig.

FieldTypeDescription
context_windowu32Max input tokens
max_output_tokensu32Max output tokens
supports_toolsboolFunction/tool calling
supports_visionboolImage inputs
supports_streamingboolStreaming responses
cost_per_1k_inputOption<f64>USD per 1K input tokens
cost_per_1k_outputOption<f64>USD per 1K output tokens

The prepare module provides a mutator/validator pipeline for model-specific request shaping:

  • Gpt5Mutator — strips temperature (unsupported), maps max_tokensmax_completion_tokens, injects reasoning_effort
  • QwenVllmExtras — injects chat_template_kwargs, tool_call_parser, reasoning_parser for vLLM-hosted Qwen3
  • ProfileCapabilityValidator — validates request fields against model profile (Strict rejects, Permissive warns)
use llm_client::prepare::{prepare_request, Gpt5Mutator, ProfileCapabilityValidator, Policy};
let shaped_request = prepare_request(
&model_config,
raw_request,
&[Box::new(Gpt5Mutator)],
&[Box::new(ProfileCapabilityValidator)],
Policy::Permissive,
)?;
TypeModuleDescription
LlmClientllm_client::clientMain facade — .chat(), .chat_stream(), .capabilities()
LlmClientBuilderllm_client::clientBuilder with .base_url(), .auth(), .api_mode(), etc.
WireFormatllm_client::clientOpenAiCompat or AnthropicMessages
ApiModellm_client::model_clientChat, Responses, or Auto
StreamEventllm_client::streamStreaming event vocabulary
LlmEventStreamllm_client::streamPin<Box<dyn Stream<Item = Result<StreamEvent, LlmError>>>>
SseParserllm_client::streamLow-level SSE line parser
StreamingPolicyprotocol_transport_coreconnect_ms, first_byte_ms, idle_ms timeouts
AuthProviderllm_client::authTrait: authorize(&self, headers) + optional query_params()
ApiKeyAuthllm_client::authAuthorization: Bearer auth
AnthropicApiKeyAuthllm_client::authx-api-key + anthropic-version auth
AzureOpenAiAuthllm_client::authAzure api-key or Entra bearer + api-version query param
AzureCredentialllm_client::authApiKey(String) or BearerToken(String)
ModelConfigllm_client::profileModel identity + family + profile + optional capabilities
ModelFamilyllm_client::profileModel vendor enum
ModelProfilellm_client::profileFamily-specific parameter variants
ModelCapabilitiesllm_client::profileContext window, limits, feature flags, costs
ClientCapabilitiesllm_client::model_clientstreaming, tool_calling, structured_output flags
LlmRequestllm_client::typesChat request: model, messages, temperature, max_tokens, tools, etc.
LlmResponsellm_client::typesNon-streaming response
ChatMessagellm_client::typesRole + content + optional tool_calls