Limiting Router Tokens¶
Overview¶
The XiansAi router automatically manages chat history token limits to prevent context_length_exceeded
errors. This is especially important when using functions that return large content (like web scraping) or when having long conversations.
Configuration¶
Configure token limits in your Flow's constructor by setting RouterOptions
properties:
[Workflow("My Bot")]
public class MyBot : FlowBase
{
public MyBot()
{
SystemPrompt = "You are a helpful assistant.";
// Configure token limits
RouterOptions.TokenLimit = 80000; // Trigger reduction at 80k tokens
RouterOptions.TargetTokenCount = 50000; // Reduce to 50k tokens
RouterOptions.MaxTokensPerFunctionResult = 10000; // Limit large function results
}
}
Token Limit Properties¶
Property | Default | Description |
---|---|---|
TokenLimit | 80,000 | Maximum tokens before chat history reduction triggers. Set to 0 to disable. |
TargetTokenCount | 50,000 | Target token count after reduction. Should be significantly lower than TokenLimit. |
MaxTokensPerFunctionResult | 10,000 | Maximum tokens allowed for a single function result (e.g., web scraping). |
Model-Specific Recommendations¶
GPT-4 (128k context)¶
RouterOptions.TokenLimit = 100000; // Conservative limit
RouterOptions.TargetTokenCount = 60000; // Leave room for responses
RouterOptions.MaxTokensPerFunctionResult = 15000;
GPT-3.5 Turbo (16k context)¶
RouterOptions.TokenLimit = 12000; // Stay well below limit
RouterOptions.TargetTokenCount = 8000; // Conservative target
RouterOptions.MaxTokensPerFunctionResult = 4000;
GPT-3.5 Turbo (4k context)¶
RouterOptions.TokenLimit = 3000; // Very conservative
RouterOptions.TargetTokenCount = 2000; // Leave room for system prompt
RouterOptions.MaxTokensPerFunctionResult = 1000;
How It Works¶
The router uses a two-stage reduction strategy:
- Function Result Truncation: Large function results (like web scraping) are truncated first
- Message History Reduction: If still over limit, older messages are removed while preserving:
- System messages
- Recent conversation context
- Function call/result pairs
Common Scenarios¶
Web Scraping Bots¶
When using capabilities that scrape web content:
public MyWebBot()
{
RouterOptions.TokenLimit = 80000;
RouterOptions.MaxTokensPerFunctionResult = 8000; // Limit scraped content
}
Long Conversation Bots¶
For bots with extensive chat history:
public MyChatBot()
{
RouterOptions.TokenLimit = 60000;
RouterOptions.TargetTokenCount = 30000; // Aggressive reduction
RouterOptions.HistorySizeToFetch = 100; // Fetch more history
}
High-Precision Bots¶
For bots requiring maximum context:
public MyAnalysisBot()
{
RouterOptions.TokenLimit = 110000; // Use more context
RouterOptions.TargetTokenCount = 80000; // Less aggressive reduction
}
Troubleshooting¶
Still Getting Token Limit Errors?¶
- Lower TokenLimit: Try reducing by 20,000 tokens
- Reduce Function Results: Lower
MaxTokensPerFunctionResult
- Check System Prompt: Very long system prompts consume tokens
- Monitor Logs: Look for reduction warnings in application logs
Performance Issues?¶
- Increase TargetTokenCount: Reduce frequency of reductions
- Optimize Functions: Return smaller, more focused results
- Disable if Not Needed: Set
TokenLimit = 0
for simple bots
Example: Complete Configuration¶
[Workflow("Percy: Web Reporter")]
public class WebReporterBot : FlowBase
{
public WebReporterBot()
{
SystemPrompt = "You are a web content reporter with scraping capabilities.";
// Token management for web scraping
RouterOptions.TokenLimit = 85000; // Conservative for GPT-4
RouterOptions.TargetTokenCount = 55000; // Leave room for analysis
RouterOptions.MaxTokensPerFunctionResult = 12000; // Allow larger scraped content
RouterOptions.HistorySizeToFetch = 20; // Moderate history
// Other settings
RouterOptions.MaxTokens = 4096; // Response limit
RouterOptions.Temperature = 0.3; // Focused responses
}
}
Notes¶
- Token estimation uses ~4 characters per token heuristic
- System messages are always preserved during reduction
- Function call/result pairs are kept together when possible
- Reduction triggers before sending to the AI model, preventing errors
- All reductions are logged for debugging purposes