Krrish Dholakia
a26f83fd3c
fix: update calendly on repo
2026-02-23 06:13:59 -08:00
Sameer Kankute
9b5bbee906
Merge pull request #21786 from BerriAI/litellm_oss_staging_02_21_2026
...
Litellm oss staging 02 21 2026
2026-02-23 18:51:55 +05:30
Sameer Kankute
8decf04d8a
Merge pull request #21877 from BerriAI/litellm_oss_staging_02_22_2026
...
Litellm oss staging 02 22 2026
2026-02-23 18:50:47 +05:30
Sameer Kankute
37d45139f2
Merge pull request #21917 from BerriAI/litellm_fix_model_cost_map_wildcard
...
Fix: Anthropic model wildcard access issue
2026-02-23 18:45:49 +05:30
TomAlon
99184c48d9
Add Noma guardrails v2 based on custom guardrails ( #21400 )
2026-02-23 05:05:27 -08:00
Sameer Kankute
c7aafdf794
Merge pull request #21926 from BerriAI/main
...
merge main in oss 21 02
2026-02-23 18:17:30 +05:30
Sameer Kankute
57af8e6a93
Merge pull request #21924 from BerriAI/main
...
merge main in oss 22 02
2026-02-23 18:11:36 +05:30
Sameer Kankute
eaf3900200
Fix name of title
2026-02-23 17:18:31 +05:30
Sameer Kankute
9b27cd8c0e
Add incident report
2026-02-23 17:13:44 +05:30
Cesar Garcia
b8cef1a4e5
docs: add OpenClaw integration tutorial ( #21605 )
...
* docs: add OpenClaw integration tutorial
* docs: simplify OpenClaw proxy start command
* docs: rewrite OpenClaw integration guide for clarity
- Use gpt-5 as default model
- Replace poetry run with standard litellm CLI
- Add prerequisites section and verification step
- Simplify onboarding instructions (table format)
- Move manual config and troubleshooting to bottom
- Add multi-model config (claude-sonnet, gemini-flash)
* docs: fix model name in OpenClaw manual config example
* docs: rewrite OpenClaw integration guide from scratch
Rewrote the guide based on hands-on testing of every command.
Key changes:
- Replace non-existent `openclaw chat` with verified commands
(dashboard, tui, agent --agent main)
- Add 3 onboarding options: QuickStart, Manual, and non-interactive
- Fix health check (requires Bearer token)
- Remove misleading "Starting from scratch" section
- Use gpt-4o instead of gpt-5 as the example model
- Clarify that API keys can come from export, .env, or any method
- Add config reference section showing openclaw.json structure
- Add real troubleshooting based on issues found during testing
2026-02-21 20:16:27 -08:00
Krish Dholakia
52585eb2d7
Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header ( #21870 )" ( #21876 )
...
This reverts commit bce078a796 .
2026-02-21 20:12:01 -08:00
Edwin Isac
bce078a796
fix(vertex_ai): enable context-1m-2025-08-07 beta header ( #21870 )
...
* server root path regression doc
* fixing syntax
* fix: replace Zapier webhook with Google Form for survey submission (#21621 )
* Replace Zapier webhook with Google Form for survey submission
* Add back error logging for survey submission debugging
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com >
* Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth"
This reverts commit 0e1db3f7e4 , reversing
changes made to 7e2d6f2355 .
* test_vertex_ai_gemini_2_5_pro_streaming
* UI new build
* fix rendering
* ui new build
* docs fix
* docs fix
* docs fix
* docs fix
* docs fix
* docs fix
* docs fix
* docs fix
* release note docs
* docs
* adding image
* fix(vertex_ai): enable context-1m-2025-08-07 beta header
The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai,
causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`.
This prevented using Claude's 1M context window feature via Vertex AI, resulting in
`prompt is too long: 460500 tokens > 200000 maximum` errors.
Fixes #21861
---------
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com >
Co-authored-by: milan-berri <milan@berri.ai >
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com >
2026-02-21 20:11:13 -08:00
LeeJuOh
50f36d9ca6
fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo ( #21754 )
...
* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo
* fix(budget): update stale docstring on get_budget_reset_time
2026-02-21 19:35:06 -08:00
yuneng-jiang
5bb52d0202
adding image
2026-02-21 18:09:18 -08:00
yuneng-jiang
ea37f59de4
Merge remote-tracking branch 'origin' into litellm_yj_docs_feb21_release
2026-02-21 18:05:11 -08:00
Ishaan Jaffer
84b572d719
docs
2026-02-21 18:00:38 -08:00
yuneng-jiang
5e26891da2
release note docs
2026-02-21 17:58:36 -08:00
Ishaan Jaffer
19f7e881f3
docs fix
2026-02-21 17:53:51 -08:00
Ishaan Jaffer
356eb5a413
docs fix
2026-02-21 17:51:45 -08:00
Ishaan Jaffer
522954fe0d
docs fix
2026-02-21 17:47:44 -08:00
Ishaan Jaffer
45bef9ade8
docs fix
2026-02-21 17:46:01 -08:00
Ishaan Jaffer
5e71f6128b
docs fix
2026-02-21 17:40:39 -08:00
Ishaan Jaffer
e157f5a8f2
docs fix
2026-02-21 17:35:16 -08:00
Ishaan Jaffer
661c6faac6
docs fix
2026-02-21 17:28:04 -08:00
Ishaan Jaffer
efebd37183
docs fix
2026-02-21 17:28:04 -08:00
yuneng-jiang
823bb023df
Merge branch 'main' into litellm_yj_docs_feb21
2026-02-21 17:12:28 -08:00
Ishaan Jaffer
ab032c292c
docs fix
2026-02-21 16:36:22 -08:00
yuneng-jiang
aefc7c14f6
Merge remote-tracking branch 'origin' into doc_yj_feb21
2026-02-21 16:05:07 -08:00
yuneng-jiang
70fd2aa219
fixing syntax
2026-02-21 16:04:42 -08:00
yuneng-jiang
153bf1d856
server root path regression doc
2026-02-21 15:57:06 -08:00
Ishaan Jaffer
775fb79260
fix
2026-02-21 15:45:03 -08:00
Darien Kindlund
ca5c109a92
feat: add optional digest mode for Slack alert types ( #21683 )
...
Adds per-alert-type digest mode that aggregates duplicate alerts
within a configurable time window and emits a single summary message
with count, start/end timestamps.
Configuration via general_settings.alert_type_config:
alert_type_config:
llm_requests_hanging:
digest: true
digest_interval: 86400
Digest key: (alert_type, request_model, api_base)
Default interval: 24 hours
Window type: fixed interval
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-02-21 15:19:17 -08:00
Ishaan Jaff
6dc9823926
docs(release-notes): update v1.81.14 - split guardrail sections, add eval results, fix key highlights and section placement ( #21847 )
2026-02-21 15:18:46 -08:00
Ishaan Jaff
eac3ae8121
docs: update v1.81.14 release notes - guardrail model garden, complexity router placement ( #21843 )
...
* docs(release-notes): update v1.81.14 key highlights and section placement
* docs(release-notes): rewrite key highlights and add guardrail narrative section
* docs(release-notes): rewrite guardrail narrative to match release notes style
* docs(release-notes): add guardrail eval results section
2026-02-21 15:10:21 -08:00
Ishaan Jaff
8c7f667df2
docs: v1.81.14-stable release notes ( #21839 )
...
* docs(release-notes): add v1.81.14-stable release notes
* fix(docs): fix MDX compilation errors in auto_routing.md
* docs(release-notes): polish v1.81.14 - narrative paragraph, consolidated guardrail templates, merged competitor bullet
2026-02-21 14:50:50 -08:00
shin-bot-litellm
1be30f5129
feat(router): Add complexity-based auto routing strategy ( #21789 )
...
* feat(router): Add complexity-based auto routing strategy
Adds a rule-based routing strategy that classifies requests by complexity
and routes them to appropriate models - with zero API calls and sub-millisecond
latency.
## Features
- **Zero external API calls** - all scoring is local
- **Sub-millisecond latency** - typically <1ms per classification
- **Weighted multi-dimensional scoring** across 7 dimensions:
- Token count (short=simple, long=complex)
- Code presence (code keywords → complex)
- Reasoning markers ("step by step" → reasoning tier)
- Technical terms (domain complexity)
- Simple indicators ("what is" → simple, negative weight)
- Multi-step patterns (numbered steps)
- Question complexity (multiple questions)
- **Configurable tier boundaries** and model mappings
- **Reasoning override** - 2+ reasoning markers force REASONING tier
## Usage
```yaml
model_list:
- model_name: smart-router
litellm_params:
model: auto_router/complexity_router
complexity_router_config:
tiers:
SIMPLE: gpt-4o-mini
MEDIUM: gpt-4o
COMPLEX: claude-sonnet-4
REASONING: o1-preview
```
Inspired by ClawRouter: https://github.com/BlockRunAI/ClawRouter
## Files Added
- litellm/router_strategy/complexity_router/complexity_router.py - Main router class
- litellm/router_strategy/complexity_router/config.py - Configuration and defaults
- litellm/router_strategy/complexity_router/__init__.py - Package exports
- litellm/router_strategy/complexity_router/README.md - Documentation
- tests/test_litellm/router_strategy/test_complexity_router.py - Test suite (37 tests)
## Files Modified
- litellm/router.py - Integration with pre_routing_hook
- litellm/types/router.py - New config params
* feat(router): Add complexity-based auto routing strategy
Adds a new rule-based routing strategy that classifies requests by complexity
and routes them to appropriate models - without any external API calls.
## Features
- Weighted scoring across 7 dimensions: token count, code presence, reasoning
markers, technical terms, simple indicators, multi-step patterns, questions
- Maps to 4 tiers: SIMPLE, MEDIUM, COMPLEX, REASONING
- Each tier configurable to a different model
- Zero API calls, <1ms latency
- Inspired by ClawRouter
## Configuration
```yaml
model_list:
- model_name: smart_router
litellm_params:
model: auto_router/complexity_router
complexity_router_config:
tiers:
SIMPLE: gemini-2.0-flash
MEDIUM: gpt-4o-mini
COMPLEX: claude-sonnet-4
REASONING: claude-opus-4
```
## Use Cases
- Cost optimization: route simple queries to cheaper models
- Quality optimization: route complex queries to capable models
- Zero configuration: works out of the box with sensible defaults
* feat(router): Add complexity-based auto routing strategy
Adds a new rule-based routing strategy that classifies requests by complexity
and routes them to appropriate models - without any external API calls.
- Weighted scoring across 7 dimensions: token count, code presence, reasoning
markers, technical terms, simple indicators, multi-step patterns, questions
- Maps to 4 tiers: SIMPLE, MEDIUM, COMPLEX, REASONING
- Each tier configurable to a different model
- Zero API calls, <1ms latency
- Inspired by ClawRouter
```yaml
model_list:
- model_name: smart_router
litellm_params:
model: auto_router/complexity_router
complexity_router_config:
tiers:
SIMPLE: gemini-2.0-flash
MEDIUM: gpt-4o-mini
COMPLEX: claude-sonnet-4
REASONING: claude-opus-4
```
- Cost optimization: route simple queries to cheaper models
- Quality optimization: route complex queries to capable models
- Zero configuration: works out of the box with sensible defaults
* feat: add enterprise presets for complexity router
Adds preset configurations for different cloud providers:
- bedrock: AWS Bedrock (Claude models)
- vertex: Google Vertex AI (Gemini models)
- azure: Azure OpenAI (GPT + o1)
- standard: Direct API (OpenAI + Anthropic)
- cost_optimized: Maximum savings (Gemini Flash + cheaper models)
Usage:
```yaml
complexity_router_config:
preset: bedrock # or vertex, azure, standard, cost_optimized
```
* feat(ui): update auto router submit handler for complexity router
- Handle complexity_router model type in submit handler
- Generate correct litellm_params for complexity router:
- model: auto_router/complexity_router
- complexity_router_config: { tiers: { SIMPLE, MEDIUM, COMPLEX, REASONING } }
- Keep existing semantic router handling intact
- Add success notification with router type name
* docs: update PR description with UI changes
* chore: remove preset feature, keep simple tier config
* fix: exclude complexity_router from auto_router check
The _is_auto_router_deployment() was matching all auto_router/* models,
causing complexity_router to fail initialization. Now it explicitly
excludes auto_router/complexity_router which has its own handler.
* fix(complexity_router): Address Greptile review feedback
Fixes 5 issues flagged in code review:
1. **Mutable singleton mutation bug** - Now always creates a new
ComplexityRouterConfig instance instead of reusing DEFAULT_COMPLEXITY_CONFIG
singleton, preventing cross-instance config pollution.
2. **Substring matching false positives** - Added word boundaries (spaces)
to short keywords like 'ok', 'try', 'api', 'git', 'node', 'java', 'vue'
to prevent matching within longer words (e.g., 'capital' matching 'api').
3. **Redundant message extraction** - Simplified to single reverse loop that
extracts both last user message and last system prompt efficiently.
4. **Unused imports** - Removed unused DEFAULT_CREATIVE_KEYWORDS and
DEFAULT_MULTI_STEP_PATTERNS imports.
5. **Missing async_pre_routing_hook tests** - Added comprehensive tests for:
- Multi-turn conversations
- List-type content handling
- No user message case
- Empty string content
- Message preservation
- Singleton mutation prevention
* fix(complexity_router): Address Greptile review feedback
- Use word boundary matching for short keywords (<5 chars) to avoid
false positives (e.g., 'api' matching 'capital', 'git' matching 'digital')
- Remove 'ok' from simple keywords (too many false positives)
- Add tests for keyword false positive prevention
- Fix test expectations for edge cases (empty string content, list content)
Addresses: 2/5 Greptile score feedback on PR #21789
* docs(auto_routing): Add complexity router documentation
- Add Complexity Router section to auto_routing.md
- Include comparison table with semantic auto router
- Add Python SDK and Proxy Server configuration examples
- Document all configuration options (tier boundaries, token thresholds, dimension weights)
- Explain how complexity scoring works
* feat(complexity_router): Add eval suite + tune scoring parameters
Added comprehensive evaluation suite with 29 test cases covering:
- SIMPLE tier: greetings, definitions, factual questions
- MEDIUM tier: technical explanations, comparisons, debugging
- COMPLEX tier: architecture design, complex coding
- REASONING tier: explicit reasoning requests
- Regression tests: substring false positive prevention
Tuned scoring parameters based on eval results:
- Lowered tier boundaries (0.15/0.35/0.60) for better tier distribution
- Increased code/technical weights (0.30/0.25) for complex prompts
- Reduced simple indicator weight (0.05) to avoid over-penalizing
- Fixed 'hey'/'hi' keywords to require leading space
Eval results: 29/29 passed (100%)
* fix(complexity_router): Address Greptile review round 2
1. **Empty user message handling** - Changed from falsy check to None check
to properly distinguish 'no user message' from 'empty string message'
2. **ReDoS prevention** - Changed 'first.*then' to 'first.*?then' (non-greedy)
to prevent regex backtracking on pathological inputs
3. **Documentation sync** - Updated README.md to match actual config values:
- Tier boundaries: 0.15/0.35/0.60 (not 0.25/0.50/0.75)
- Dimension weights: tokenCount=0.10, codePresence=0.30, technicalTerms=0.25,
simpleIndicators=0.05, multiStepPatterns=0.03, questionComplexity=0.02
4. **Missing UI component** - Added ComplexityRouterConfig.tsx with:
- Tier-to-model dropdown selectors
- Descriptions and examples for each tier
- How classification works explanation
5. **Inline import comment** - Added explanation for why ComplexityRouter
import is inline (matches AutoRouter pattern, avoids circular imports)
* docs(auto_routing): fix dimension weights and tier boundaries to match config.py defaults
* fix(complexity_router): skip empty string content in async_pre_routing_hook
* fix(router): remove or {} masking None complexity_router_config
* fix(config): remove unused DEFAULT_MULTI_STEP_PATTERNS and DEFAULT_CREATIVE_KEYWORDS exports
* fix(complexity_router): use word boundary matching for all single-word keywords, avoid double-scanning reasoning keywords
* fix(router): clarify circular import comment for ComplexityRouter
* docs(README): fix token thresholds to match config.py defaults
* test(complexity_router): add false positive tests for error/class/merge keyword matching
* fix(complexity_router): align .get() fallbacks with config.py defaults, document system prompt scoring
* fix(config): deduplicate keywords across code and technical lists
---------
Co-authored-by: OpenClaw Assistant <assistant@openclaw.ai >
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com >
2026-02-21 13:23:37 -08:00
Ishaan Jaff
d928588de0
docs: mark v1.81.12 as stable ( #21809 )
...
* docs: mark v1.81.12 as stable, point to stable docker image and pip
* docs: fix v1.81.12 docker image to point to stable
2026-02-21 12:45:41 -08:00
Ishaan Jaff
2acc5cc457
fix(security): fix CVE-2025-69873, CVE-2026-26996 in docs deps; allowlist nodejs_wheel CVEs in Grype scan ( #21787 )
...
* fix(security): fix CVE-2025-69873 and CVE-2026-26996 in docs dependencies
Use npm overrides to pin patched versions:
- ajv@6.12 .6 → 6.14.0 (fixes ReDoS CVE-2025-69873)
- ajv@8.17 .1 → 8.18.0 (fixes ReDoS CVE-2025-69873)
- minimatch@3.1.2 → 10.2.1 (fixes DoS CVE-2026-26996)
serve-handler only calls minimatch(path, pattern) so the 3.x→10.x
upgrade is safe.
* fix(ruff): add missing Set and Dict imports to fix F821 errors
* fix(security): scope ajv overrides to avoid top-level version conflict
Replacing global 'ajv: 8.18.0' override with scoped 'schema-utils@4'
override. The global override conflicted with the nested file-loader/
null-loader/url-loader overrides, causing npm to install ajv@6 at the
top level where ajv-keywords@5.x requires ajv@8 (ajv/dist/compile/codegen).
Now:
- schema-utils@3 + loaders → ajv@6.14 .0 (safe minor bump)
- schema-utils@4 → ajv@8.18 .0 (safe minor bump)
- top-level ajv unmodified (stays at 8.x for ajv-keywords@5)
* fix(security): allowlist minimatch and tar CVEs from nodejs_wheel, bump tar override to >=7.5.8
2026-02-21 11:18:52 -08:00
Ishaan Jaff
8a145da793
fix(security): fix CVE-2025-69873 and CVE-2026-26996 in docs dependencies ( #21782 )
...
* fix(security): fix CVE-2025-69873 and CVE-2026-26996 in docs dependencies
Use npm overrides to pin patched versions:
- ajv@6.12 .6 → 6.14.0 (fixes ReDoS CVE-2025-69873)
- ajv@8.17 .1 → 8.18.0 (fixes ReDoS CVE-2025-69873)
- minimatch@3.1.2 → 10.2.1 (fixes DoS CVE-2026-26996)
serve-handler only calls minimatch(path, pattern) so the 3.x→10.x
upgrade is safe.
* fix(ruff): add missing Set and Dict imports to fix F821 errors
* fix(security): scope ajv overrides to avoid top-level version conflict
Replacing global 'ajv: 8.18.0' override with scoped 'schema-utils@4'
override. The global override conflicted with the nested file-loader/
null-loader/url-loader overrides, causing npm to install ajv@6 at the
top level where ajv-keywords@5.x requires ajv@8 (ajv/dist/compile/codegen).
Now:
- schema-utils@3 + loaders → ajv@6.14 .0 (safe minor bump)
- schema-utils@4 → ajv@8.18 .0 (safe minor bump)
- top-level ajv unmodified (stays at 8.x for ajv-keywords@5)
2026-02-21 10:56:11 -08:00
Harshit Jain
456d8f5524
feat: add session_id to have better routing
2026-02-21 18:45:50 +05:30
Zhenting Huang
e0aaedc9d1
feat(semantic-cache): support configurable vector dimensions for Qdrant ( #21649 )
...
Add vector_size parameter to QdrantSemanticCache and expose it through
the Cache facade as qdrant_semantic_cache_vector_size. This allows users
to use embedding models with dimensions other than the default 1536,
enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d),
voyage, cohere, etc.
The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for
backward compatibility. When creating new collections, the configured
vector_size is used instead of the hardcoded constant.
Closes #9377
2026-02-21 00:51:15 -08:00
Harshit Jain
80c3b236e2
Update docs/my-website/docs/troubleshoot/rollback.md
...
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-21 10:40:41 +05:30
Harshit Jain
5916cf15ad
Update docs/my-website/docs/troubleshoot/rollback.md
...
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-21 10:40:31 +05:30
Harshit Jain
ece5b8c565
doc: add rollback safety check
2026-02-21 10:33:17 +05:30
yuneng-jiang
65dc7556a8
[Fix] Fix web search model info regression, deprecated prompt caching model, undocumented env keys
...
- Revert test_anthropic_web_search_in_model_info to use claude-3-5-haiku-latest
(model info test doesn't make API calls, so the -latest alias is fine here)
- Replace claude-3-7-sonnet-20250219 with claude-sonnet-4-5-20250929 in
test_anthropic_prompt_caching.py (10 instances)
- Include pending doc updates for COMPETITOR_LLM_TEMPERATURE and
MAX_COMPETITOR_NAMES env vars in config_settings.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-02-20 17:26:58 -08:00
Sameer Kankute
871c049f49
Add support for reasoning and tools viaconfig
2026-02-20 16:22:19 +05:30
Ishaan Jaff
18f8a2cee3
docs: add latency overhead troubleshooting guide ( #21603 )
...
* add latency overhead troubleshooting doc
* add latency_overhead to troubleshooting sidebar
* docs: add x-litellm-overhead-duration-ms to latency troubleshooting guide
2026-02-19 12:42:33 -08:00
Ishaan Jaff
2c8fcf854a
docs: add latency overhead troubleshooting guide ( #21600 )
...
* add latency overhead troubleshooting doc
* add latency_overhead to troubleshooting sidebar
2026-02-19 12:34:23 -08:00
Sameer Kankute
4d392cacb8
Fix release
2026-02-20 00:27:12 +05:30
Sameer Kankute
c123dc5c24
Fix vercel build
2026-02-19 22:19:34 +05:30