llm_context_core
llm_context_core
Section titled “llm_context_core”WASM-compatible LLM context window management library.
Provides token estimation, context budget computation, pluggable history management strategies, and trait ports for long-term memory backends.
Architecture
Section titled “Architecture”┌──────────────────────────────────────────────────┐│ LLM Context Window ││ ┌──────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ System │ │ Long-Term │ │ Working │ ││ │ Prompt │ │ Memories │ │ Memory │ ││ │ (fixed) │ │ (retrieved) │ │ (recent N) │ ││ └──────────┘ └─────────────┘ └─────────────┘ │└──────────────────────────────────────────────────┘- Tier 1 (Working): Current turn messages, bounded by token budget
- Tier 2 (Short-Term): Conversation history with sliding window / summarization
- Tier 3 (Long-Term): Cross-conversation semantic retrieval (Qdrant, etc.)
Traits
Section titled “Traits”LongTermMemory
Section titled “LongTermMemory”Trait port for long-term memory backends.
Implementations handle embedding, storage, and semantic retrieval. The trait is object-safe and WASM-compatible.
Implementations
Section titled “Implementations”agent_memory_store::QdrantMemoryStore— Qdrant + embedding_provider_lib- In-memory store for testing
Required / Provided Methods
fn store(&self, entry: MemoryEntry) -> MemoryFuture<()>Store a memory entry (embedding + indexing happens internally).
fn recall(&self, query: &str, top_k: usize, filters: MemoryFilters) -> MemoryFuture<Vec<MemoryEntry>>Recall relevant memories by semantic similarity to a query.
Returns up to top_k entries sorted by relevance (highest first).
fn forget(&self, filters: MemoryFilters) -> MemoryFuture<u64>Delete memories matching the given filters.
Returns the number of entries deleted.
ContextStrategy
Section titled “ContextStrategy”Trait for context window management strategies.
Implementations decide which messages to retain when the conversation history exceeds the available token budget. The contract:
- Input: full message history (OpenAI JSON format) + budget
- Output: trimmed message list that fits within
budget.available_for_history - Ordering must be preserved (messages keep their chronological order)
Required / Provided Methods
fn apply(&self, messages: &[serde_json::Value], budget: &ContextBudget) -> Vec<serde_json::Value>Apply this strategy to trim messages to fit within budget.
Returns a new Vec containing only the messages that should be
sent to the LLM. The caller owns the returned vector.
fn name(&self) -> &''static strHuman-readable name for logging and diagnostics.
Structs
Section titled “Structs”ContextBudget
Section titled “ContextBudget”Token budget allocation for an LLM context window.
Breaks the total context window into reserved zones and computes the remaining space available for conversation history.
┌────────────────────────────────────────────────────┐│ Context Window (total) │├────────────┬────────────┬────────┬─────────────────┤│ System │ Tools │ Output │ Available ││ (fixed) │ (schemas) │ (gen) │ for History │└────────────┴────────────┴────────┴─────────────────┘Fields
| Field | Type | Description |
|---|---|---|
total | u32 | Total context window in tokens (from ModelCapabilities). |
reserved_output | u32 | Tokens reserved for model output generation. |
reserved_system | u32 | Tokens consumed by the system prompt. |
reserved_tools | u32 | Tokens consumed by tool schema definitions. |
available_for_history | u32 | Remaining tokens available for conversation history + memories. |
Methods
fn new(context_window: u32, max_output_tokens: u32, system_message: Option<&str>, tools_json: &[serde_json::Value]) -> SelfCreate a new budget from model capabilities and current context.
Parameters
Section titled “Parameters”context_window: Total tokens the model can acceptmax_output_tokens: Tokens to reserve for generationsystem_message: The system prompt text (will be estimated)tools_json: Tool schema definitions (will be estimated)
history_usage
Section titled “history_usage”fn history_usage(&self, history_messages: &[serde_json::Value]) -> u32Compute how many tokens are actually used by a set of history messages.
would_exceed
Section titled “would_exceed”fn would_exceed(&self, history_messages: &[serde_json::Value]) -> boolCheck whether adding the given messages would exceed the history budget.
utilization
Section titled “utilization”fn utilization(&self, history_messages: &[serde_json::Value]) -> f32Utilization ratio (0.0 to 1.0) of the full context window.
remaining
Section titled “remaining”fn remaining(&self, history_messages: &[serde_json::Value]) -> u32Remaining tokens available after current history usage.
HistoryManager
Section titled “HistoryManager”Central manager for LLM context window management.
Call prepare_messages before each LLM invocation
to get a trimmed, budget-aware message list. Call
on_turn_complete after each turn to update
summaries and store memories.
Methods
fn new(config: HistoryManagerConfig) -> SelfCreate a new HistoryManager with the given configuration.
with_summarizer
Section titled “with_summarizer”fn with_summarizer(self, summarizer: Arc<dyn Summarizer>) -> SelfSet the summarizer implementation.
with_memory
Section titled “with_memory”fn with_memory(self, memory: Arc<dyn LongTermMemory>) -> SelfSet the long-term memory backend.
with_summaries
Section titled “with_summaries”fn with_summaries(self, summaries: Vec<String>) -> SelfSeed the manager with previously persisted conversation summaries.
prepare_messages
Section titled “prepare_messages”async fn prepare_messages(&self, budget: &ContextBudget, system_message: Option<&str>, history: &[serde_json::Value], current_turn: &[serde_json::Value], memory_filters: &MemoryFilters) -> Vec<serde_json::Value>Prepare messages for the next LLM invocation.
This is the main entry point. It:
- Optionally retrieves relevant long-term memories
- Constructs the system message (with memories + summaries)
- Applies the context strategy to trim history within budget
Returns the message list ready to be sent to the LLM.
on_turn_complete
Section titled “on_turn_complete”async fn on_turn_complete(&mut self, evicted_messages: &[serde_json::Value], agent_id: &str, user_id: Option<&str>, conversation_id: Option<&str>)Notify the manager that a turn has completed.
If summarization is enabled, this may generate a summary of evicted messages and optionally store it in long-term memory.
summaries
Section titled “summaries”fn summaries(&self) -> &[String]Get accumulated summaries (for diagnostics).
strategy_name
Section titled “strategy_name”fn strategy_name(&self) -> &''static strGet the strategy name (for diagnostics).
HistoryManagerConfig
Section titled “HistoryManagerConfig”Configuration for the [HistoryManager].
Fields
| Field | Type | Description |
|---|---|---|
strategy | ContextStrategyKind | Which strategy to use for trimming history. |
enable_summarization | bool | Whether to generate summaries of evicted messages. |
enable_long_term_memory | bool | Whether to query long-term memory for relevant context. |
recall_top_k | usize | Number of long-term memories to inject per turn. |
memory_token_budget | u32 | Maximum tokens to allocate for injected long-term memories. |
MemoryEntry
Section titled “MemoryEntry”A single memory entry stored in the long-term memory backend.
Fields
| Field | Type | Description |
|---|---|---|
id | String | Unique ID for this memory entry. |
agent_id | String | Agent that created this memory. |
user_id | Option<String> | User this memory belongs to (for multi-tenant isolation). |
conversation_id | Option<String> | Conversation this memory was extracted from. |
content | String | The text content to embed and store. |
memory_type | MemoryType | Classification of this memory. |
timestamp | u64 | Unix timestamp (seconds) when this memory was created. |
score | f32 | Relevance score (set during retrieval, 0.0 to 1.0). |
metadata | HashMap<String, serde_json::Value> | Arbitrary metadata (e.g. source turn number, tags). |
MemoryFilters
Section titled “MemoryFilters”Filters for memory retrieval.
Fields
| Field | Type | Description |
|---|---|---|
agent_id | Option<String> | Filter by agent ID. |
user_id | Option<String> | Filter by user ID. |
conversation_id | Option<String> | Filter by conversation ID. |
memory_types | Vec<MemoryType> | Filter by memory type(s). |
after_timestamp | Option<u64> | Only return memories newer than this timestamp (seconds). |
MemoryType
Section titled “MemoryType”Type of memory entry — helps with filtering and relevance scoring.
Variants
| Variant | Description |
|---|---|
Summary | Summarized conversation segment. |
Fact | Extracted factual statement (e.g. “User prefers dark mode”). |
Instruction | User instruction or preference. |
ToolResult | Compressed tool result worth remembering. |
Custom(String) | Arbitrary user-defined type. |
ContextStrategyKind
Section titled “ContextStrategyKind”Selects which context strategy to use (serializable for config).
Variants
| Variant | Description |
|---|---|
SlidingWindow | Keep the most recent messages that fit within budget. |
SlidingWindowWithSummary | Sliding window, but prepend a summary of evicted messages. |
PriorityBased | Score messages by importance; keep highest-scoring within budget. |
Functions
Section titled “Functions”estimate_tokens
Section titled “estimate_tokens”fn estimate_tokens(text: &str) -> u32Estimate token count for a text string.
Uses the widely-accepted heuristic of ~4 characters per token for English text, with a small overhead for BPE tokenizer framing. Non-ASCII text uses a slightly higher ratio (3 chars/token) to account for multi-byte characters.