Commit Graph

5611 Commits

Author SHA1 Message Date
Ishaan Jaffer 356eb5a413 docs fix 2026-02-21 17:51:45 -08:00
Ishaan Jaffer 522954fe0d docs fix 2026-02-21 17:47:44 -08:00
Ishaan Jaffer 45bef9ade8 docs fix 2026-02-21 17:46:01 -08:00
Ishaan Jaffer 5e71f6128b docs fix 2026-02-21 17:40:39 -08:00
Ishaan Jaffer e157f5a8f2 docs fix 2026-02-21 17:35:16 -08:00
Ishaan Jaffer 661c6faac6 docs fix 2026-02-21 17:28:04 -08:00
Ishaan Jaffer efebd37183 docs fix 2026-02-21 17:28:04 -08:00
yuneng-jiang 823bb023df Merge branch 'main' into litellm_yj_docs_feb21 2026-02-21 17:12:28 -08:00
Ishaan Jaffer ab032c292c docs fix 2026-02-21 16:36:22 -08:00
yuneng-jiang aefc7c14f6 Merge remote-tracking branch 'origin' into doc_yj_feb21 2026-02-21 16:05:07 -08:00
yuneng-jiang 70fd2aa219 fixing syntax 2026-02-21 16:04:42 -08:00
yuneng-jiang 153bf1d856 server root path regression doc 2026-02-21 15:57:06 -08:00
Ishaan Jaffer 775fb79260 fix 2026-02-21 15:45:03 -08:00
Ishaan Jaff 6dc9823926 docs(release-notes): update v1.81.14 - split guardrail sections, add eval results, fix key highlights and section placement (#21847) 2026-02-21 15:18:46 -08:00
Ishaan Jaff eac3ae8121 docs: update v1.81.14 release notes - guardrail model garden, complexity router placement (#21843)
* docs(release-notes): update v1.81.14 key highlights and section placement

* docs(release-notes): rewrite key highlights and add guardrail narrative section

* docs(release-notes): rewrite guardrail narrative to match release notes style

* docs(release-notes): add guardrail eval results section
2026-02-21 15:10:21 -08:00
Ishaan Jaff 8c7f667df2 docs: v1.81.14-stable release notes (#21839)
* docs(release-notes): add v1.81.14-stable release notes

* fix(docs): fix MDX compilation errors in auto_routing.md

* docs(release-notes): polish v1.81.14 - narrative paragraph, consolidated guardrail templates, merged competitor bullet
2026-02-21 14:50:50 -08:00
shin-bot-litellm 1be30f5129 feat(router): Add complexity-based auto routing strategy (#21789)
* feat(router): Add complexity-based auto routing strategy

Adds a rule-based routing strategy that classifies requests by complexity
and routes them to appropriate models - with zero API calls and sub-millisecond
latency.

## Features

- **Zero external API calls** - all scoring is local
- **Sub-millisecond latency** - typically <1ms per classification
- **Weighted multi-dimensional scoring** across 7 dimensions:
  - Token count (short=simple, long=complex)
  - Code presence (code keywords → complex)
  - Reasoning markers ("step by step" → reasoning tier)
  - Technical terms (domain complexity)
  - Simple indicators ("what is" → simple, negative weight)
  - Multi-step patterns (numbered steps)
  - Question complexity (multiple questions)
- **Configurable tier boundaries** and model mappings
- **Reasoning override** - 2+ reasoning markers force REASONING tier

## Usage

```yaml
model_list:
  - model_name: smart-router
    litellm_params:
      model: auto_router/complexity_router
      complexity_router_config:
        tiers:
          SIMPLE: gpt-4o-mini
          MEDIUM: gpt-4o
          COMPLEX: claude-sonnet-4
          REASONING: o1-preview
```

Inspired by ClawRouter: https://github.com/BlockRunAI/ClawRouter

## Files Added

- litellm/router_strategy/complexity_router/complexity_router.py - Main router class
- litellm/router_strategy/complexity_router/config.py - Configuration and defaults
- litellm/router_strategy/complexity_router/__init__.py - Package exports
- litellm/router_strategy/complexity_router/README.md - Documentation
- tests/test_litellm/router_strategy/test_complexity_router.py - Test suite (37 tests)

## Files Modified

- litellm/router.py - Integration with pre_routing_hook
- litellm/types/router.py - New config params

* feat(router): Add complexity-based auto routing strategy

Adds a new rule-based routing strategy that classifies requests by complexity
and routes them to appropriate models - without any external API calls.

## Features
- Weighted scoring across 7 dimensions: token count, code presence, reasoning
  markers, technical terms, simple indicators, multi-step patterns, questions
- Maps to 4 tiers: SIMPLE, MEDIUM, COMPLEX, REASONING
- Each tier configurable to a different model
- Zero API calls, <1ms latency
- Inspired by ClawRouter

## Configuration
```yaml
model_list:
  - model_name: smart_router
    litellm_params:
      model: auto_router/complexity_router
      complexity_router_config:
        tiers:
          SIMPLE: gemini-2.0-flash
          MEDIUM: gpt-4o-mini
          COMPLEX: claude-sonnet-4
          REASONING: claude-opus-4
```

## Use Cases
- Cost optimization: route simple queries to cheaper models
- Quality optimization: route complex queries to capable models
- Zero configuration: works out of the box with sensible defaults

* feat(router): Add complexity-based auto routing strategy

Adds a new rule-based routing strategy that classifies requests by complexity
and routes them to appropriate models - without any external API calls.

- Weighted scoring across 7 dimensions: token count, code presence, reasoning
  markers, technical terms, simple indicators, multi-step patterns, questions
- Maps to 4 tiers: SIMPLE, MEDIUM, COMPLEX, REASONING
- Each tier configurable to a different model
- Zero API calls, <1ms latency
- Inspired by ClawRouter

```yaml
model_list:
  - model_name: smart_router
    litellm_params:
      model: auto_router/complexity_router
      complexity_router_config:
        tiers:
          SIMPLE: gemini-2.0-flash
          MEDIUM: gpt-4o-mini
          COMPLEX: claude-sonnet-4
          REASONING: claude-opus-4
```

- Cost optimization: route simple queries to cheaper models
- Quality optimization: route complex queries to capable models
- Zero configuration: works out of the box with sensible defaults

* feat: add enterprise presets for complexity router

Adds preset configurations for different cloud providers:
- bedrock: AWS Bedrock (Claude models)
- vertex: Google Vertex AI (Gemini models)
- azure: Azure OpenAI (GPT + o1)
- standard: Direct API (OpenAI + Anthropic)
- cost_optimized: Maximum savings (Gemini Flash + cheaper models)

Usage:
```yaml
complexity_router_config:
  preset: bedrock  # or vertex, azure, standard, cost_optimized
```

* feat(ui): update auto router submit handler for complexity router

- Handle complexity_router model type in submit handler
- Generate correct litellm_params for complexity router:
  - model: auto_router/complexity_router
  - complexity_router_config: { tiers: { SIMPLE, MEDIUM, COMPLEX, REASONING } }
- Keep existing semantic router handling intact
- Add success notification with router type name

* docs: update PR description with UI changes

* chore: remove preset feature, keep simple tier config

* fix: exclude complexity_router from auto_router check

The _is_auto_router_deployment() was matching all auto_router/* models,
causing complexity_router to fail initialization. Now it explicitly
excludes auto_router/complexity_router which has its own handler.

* fix(complexity_router): Address Greptile review feedback

Fixes 5 issues flagged in code review:

1. **Mutable singleton mutation bug** - Now always creates a new
   ComplexityRouterConfig instance instead of reusing DEFAULT_COMPLEXITY_CONFIG
   singleton, preventing cross-instance config pollution.

2. **Substring matching false positives** - Added word boundaries (spaces)
   to short keywords like 'ok', 'try', 'api', 'git', 'node', 'java', 'vue'
   to prevent matching within longer words (e.g., 'capital' matching 'api').

3. **Redundant message extraction** - Simplified to single reverse loop that
   extracts both last user message and last system prompt efficiently.

4. **Unused imports** - Removed unused DEFAULT_CREATIVE_KEYWORDS and
   DEFAULT_MULTI_STEP_PATTERNS imports.

5. **Missing async_pre_routing_hook tests** - Added comprehensive tests for:
   - Multi-turn conversations
   - List-type content handling
   - No user message case
   - Empty string content
   - Message preservation
   - Singleton mutation prevention

* fix(complexity_router): Address Greptile review feedback

- Use word boundary matching for short keywords (<5 chars) to avoid
  false positives (e.g., 'api' matching 'capital', 'git' matching 'digital')
- Remove 'ok' from simple keywords (too many false positives)
- Add tests for keyword false positive prevention
- Fix test expectations for edge cases (empty string content, list content)

Addresses: 2/5 Greptile score feedback on PR #21789

* docs(auto_routing): Add complexity router documentation

- Add Complexity Router section to auto_routing.md
- Include comparison table with semantic auto router
- Add Python SDK and Proxy Server configuration examples
- Document all configuration options (tier boundaries, token thresholds, dimension weights)
- Explain how complexity scoring works

* feat(complexity_router): Add eval suite + tune scoring parameters

Added comprehensive evaluation suite with 29 test cases covering:
- SIMPLE tier: greetings, definitions, factual questions
- MEDIUM tier: technical explanations, comparisons, debugging
- COMPLEX tier: architecture design, complex coding
- REASONING tier: explicit reasoning requests
- Regression tests: substring false positive prevention

Tuned scoring parameters based on eval results:
- Lowered tier boundaries (0.15/0.35/0.60) for better tier distribution
- Increased code/technical weights (0.30/0.25) for complex prompts
- Reduced simple indicator weight (0.05) to avoid over-penalizing
- Fixed 'hey'/'hi' keywords to require leading space

Eval results: 29/29 passed (100%)

* fix(complexity_router): Address Greptile review round 2

1. **Empty user message handling** - Changed from falsy check to None check
   to properly distinguish 'no user message' from 'empty string message'

2. **ReDoS prevention** - Changed 'first.*then' to 'first.*?then' (non-greedy)
   to prevent regex backtracking on pathological inputs

3. **Documentation sync** - Updated README.md to match actual config values:
   - Tier boundaries: 0.15/0.35/0.60 (not 0.25/0.50/0.75)
   - Dimension weights: tokenCount=0.10, codePresence=0.30, technicalTerms=0.25,
     simpleIndicators=0.05, multiStepPatterns=0.03, questionComplexity=0.02

4. **Missing UI component** - Added ComplexityRouterConfig.tsx with:
   - Tier-to-model dropdown selectors
   - Descriptions and examples for each tier
   - How classification works explanation

5. **Inline import comment** - Added explanation for why ComplexityRouter
   import is inline (matches AutoRouter pattern, avoids circular imports)

* docs(auto_routing): fix dimension weights and tier boundaries to match config.py defaults

* fix(complexity_router): skip empty string content in async_pre_routing_hook

* fix(router): remove or {} masking None complexity_router_config

* fix(config): remove unused DEFAULT_MULTI_STEP_PATTERNS and DEFAULT_CREATIVE_KEYWORDS exports

* fix(complexity_router): use word boundary matching for all single-word keywords, avoid double-scanning reasoning keywords

* fix(router): clarify circular import comment for ComplexityRouter

* docs(README): fix token thresholds to match config.py defaults

* test(complexity_router): add false positive tests for error/class/merge keyword matching

* fix(complexity_router): align .get() fallbacks with config.py defaults, document system prompt scoring

* fix(config): deduplicate keywords across code and technical lists

---------

Co-authored-by: OpenClaw Assistant <assistant@openclaw.ai>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
2026-02-21 13:23:37 -08:00
Ishaan Jaff d928588de0 docs: mark v1.81.12 as stable (#21809)
* docs: mark v1.81.12 as stable, point to stable docker image and pip

* docs: fix v1.81.12 docker image to point to stable
2026-02-21 12:45:41 -08:00
Ishaan Jaff 2acc5cc457 fix(security): fix CVE-2025-69873, CVE-2026-26996 in docs deps; allowlist nodejs_wheel CVEs in Grype scan (#21787)
* fix(security): fix CVE-2025-69873 and CVE-2026-26996 in docs dependencies

Use npm overrides to pin patched versions:
- ajv@6.12.6 → 6.14.0 (fixes ReDoS CVE-2025-69873)
- ajv@8.17.1 → 8.18.0 (fixes ReDoS CVE-2025-69873)
- minimatch@3.1.2 → 10.2.1 (fixes DoS CVE-2026-26996)

serve-handler only calls minimatch(path, pattern) so the 3.x→10.x
upgrade is safe.

* fix(ruff): add missing Set and Dict imports to fix F821 errors

* fix(security): scope ajv overrides to avoid top-level version conflict

Replacing global 'ajv: 8.18.0' override with scoped 'schema-utils@4'
override. The global override conflicted with the nested file-loader/
null-loader/url-loader overrides, causing npm to install ajv@6 at the
top level where ajv-keywords@5.x requires ajv@8 (ajv/dist/compile/codegen).

Now:
- schema-utils@3 + loaders → ajv@6.14.0 (safe minor bump)
- schema-utils@4 → ajv@8.18.0 (safe minor bump)
- top-level ajv unmodified (stays at 8.x for ajv-keywords@5)

* fix(security): allowlist minimatch and tar CVEs from nodejs_wheel, bump tar override to >=7.5.8
2026-02-21 11:18:52 -08:00
Ishaan Jaff 8a145da793 fix(security): fix CVE-2025-69873 and CVE-2026-26996 in docs dependencies (#21782)
* fix(security): fix CVE-2025-69873 and CVE-2026-26996 in docs dependencies

Use npm overrides to pin patched versions:
- ajv@6.12.6 → 6.14.0 (fixes ReDoS CVE-2025-69873)
- ajv@8.17.1 → 8.18.0 (fixes ReDoS CVE-2025-69873)
- minimatch@3.1.2 → 10.2.1 (fixes DoS CVE-2026-26996)

serve-handler only calls minimatch(path, pattern) so the 3.x→10.x
upgrade is safe.

* fix(ruff): add missing Set and Dict imports to fix F821 errors

* fix(security): scope ajv overrides to avoid top-level version conflict

Replacing global 'ajv: 8.18.0' override with scoped 'schema-utils@4'
override. The global override conflicted with the nested file-loader/
null-loader/url-loader overrides, causing npm to install ajv@6 at the
top level where ajv-keywords@5.x requires ajv@8 (ajv/dist/compile/codegen).

Now:
- schema-utils@3 + loaders → ajv@6.14.0 (safe minor bump)
- schema-utils@4 → ajv@8.18.0 (safe minor bump)
- top-level ajv unmodified (stays at 8.x for ajv-keywords@5)
2026-02-21 10:56:11 -08:00
Harshit Jain 456d8f5524 feat: add session_id to have better routing 2026-02-21 18:45:50 +05:30
Harshit Jain 80c3b236e2 Update docs/my-website/docs/troubleshoot/rollback.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-21 10:40:41 +05:30
Harshit Jain 5916cf15ad Update docs/my-website/docs/troubleshoot/rollback.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-21 10:40:31 +05:30
Harshit Jain ece5b8c565 doc: add rollback safety check 2026-02-21 10:33:17 +05:30
yuneng-jiang 65dc7556a8 [Fix] Fix web search model info regression, deprecated prompt caching model, undocumented env keys
- Revert test_anthropic_web_search_in_model_info to use claude-3-5-haiku-latest
  (model info test doesn't make API calls, so the -latest alias is fine here)
- Replace claude-3-7-sonnet-20250219 with claude-sonnet-4-5-20250929 in
  test_anthropic_prompt_caching.py (10 instances)
- Include pending doc updates for COMPETITOR_LLM_TEMPERATURE and
  MAX_COMPETITOR_NAMES env vars in config_settings.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-20 17:26:58 -08:00
Sameer Kankute 871c049f49 Add support for reasoning and tools viaconfig 2026-02-20 16:22:19 +05:30
Ishaan Jaff 18f8a2cee3 docs: add latency overhead troubleshooting guide (#21603)
* add latency overhead troubleshooting doc

* add latency_overhead to troubleshooting sidebar

* docs: add x-litellm-overhead-duration-ms to latency troubleshooting guide
2026-02-19 12:42:33 -08:00
Ishaan Jaff 2c8fcf854a docs: add latency overhead troubleshooting guide (#21600)
* add latency overhead troubleshooting doc

* add latency_overhead to troubleshooting sidebar
2026-02-19 12:34:23 -08:00
Sameer Kankute 4d392cacb8 Fix release 2026-02-20 00:27:12 +05:30
Sameer Kankute c123dc5c24 Fix vercel build 2026-02-19 22:19:34 +05:30
Sameer Kankute 884c763fb1 Fix date in docs 2026-02-19 22:14:20 +05:30
Sameer Kankute a951d6c681 Update docs/my-website/blog/gemin_3.1/index.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-19 22:14:20 +05:30
Sameer Kankute e27725a8b5 Update docs/my-website/blog/gemin_3.1/index.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-19 22:14:20 +05:30
Sameer Kankute 468be6f5a8 Fix date in docs 2026-02-19 22:14:20 +05:30
Sameer Kankute 2133a97e97 Add gemini-3.1-pro-preview pricing data 2026-02-19 22:14:19 +05:30
Sameer Kankute 8305bbee21 Add mapping for medium thinking level for gemini-3.1-pro-preview 2026-02-19 22:14:19 +05:30
Sameer Kankute ca34e9a3f9 Merge pull request #21543 from BerriAI/litellm_passthrough_endpoint_method
Add method based routing for passthrough endpoints
2026-02-19 19:34:04 +05:30
Sameer Kankute f2393fc9cb Merge main into litellm_passthrough_endpoint_method
Resolved conflicts in pass_through_endpoints.py by:
- Accepting main's formatting and mypy fixes
- Preserving branch's method support feature
- Preserving branch's default_query_params feature

Combined changes include:
- Method filtering for passthrough endpoints
- Default query parameters support
- Updated route key format to include methods
- Code formatting improvements from main
- Fixed type annotations

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-19 19:22:41 +05:30
Sameer Kankute 647e5237a7 Merge pull request #21555 from BerriAI/litellm_server_side_compaction_trans
[Feat] Add server side compaction translation from openai to anthropic
2026-02-19 19:14:37 +05:30
Sameer Kankute 36e21830db Merge pull request #21550 from BerriAI/litellm_add_global_usage
[Feat] Add Default usage data configuration
2026-02-19 19:10:35 +05:30
Sameer Kankute 02e10c9a74 Merge branch 'main' into litellm_server_side_compaction_trans 2026-02-19 16:45:53 +05:30
Sameer Kankute a52fc738af Add server side compaction translation from openai to anthropic 2026-02-19 16:44:35 +05:30
Sameer Kankute 0eb2a0c014 Add Default usage data configuration 2026-02-19 14:04:07 +05:30
Sameer Kankute 4a50c55d84 Add allow defining default query params for a pass through 2026-02-19 12:48:03 +05:30
Sameer Kankute 5bd7bf1b3e Add documentation for adding method 2026-02-19 11:59:09 +05:30
Harshit Jain 66ce7513f6 Merge branch 'main' into litellm_project_management_apis 2026-02-19 08:40:12 +05:30
Krish Dholakia e00c181f0c Mcp user permissions (#21462)
* feat(schema.prisma): add object permissions for end users

allows controlling if end user can call specific mcp servers

* feat: cleanup for customer_endpoints support of object permission id

* fix: cleanup str

* feat(customers/): enforce end user can only call allowed mcps - if configured

* docs: document customer/end user object permission usage

* feat: enforce end user permissions on MCP tool calls

This commit implements end user permission enforcement for MCP servers:

1. Always add server prefixes to MCP tool names
   - Removed conditional logic that only added prefixes when multiple servers existed
   - Now always adds server prefix for consistent tool naming across all scenarios
   - Updated 5 locations in server.py (list_tools, get_prompts, get_resources,
     get_resource_templates, get_prompt)

2. Created MCP End User Permission Guardrail Hook
   - New guardrail hook: litellm/proxy/guardrails/guardrail_hooks/mcp_end_user_permission.py
   - Runs on post_call to validate tool calls in LLM responses
   - Extracts MCP server name from tool names (splits on first '-')
   - Checks if end_user_id has permissions for the MCP server
   - Raises GuardrailRaisedException if end user lacks permission
   - Supports both streaming and non-streaming responses

3. Added comprehensive tests
   - Test file: tests/test_litellm/proxy/guardrails/guardrail_hooks/test_mcp_end_user_permission.py
   - Tests cover: authorized/unauthorized tools, non-MCP tools, no end_user scenarios
   - Tests permission checking logic and exception raising

The hook integrates with the existing MCPRequestHandler._get_allowed_mcp_servers_for_end_user
to fetch end user permissions and enforce access control at the response level.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: remove redundant add_prefix variable assignments

Simplified the code by removing intermediate `add_prefix` variable
assignments and passing `True` directly to function calls since
we now always add server prefixes.

Changes:
- Removed `add_prefix = True` variable assignments in 5 locations
- Changed `add_prefix=add_prefix` to `add_prefix=True` in function calls
- Added inline comments to clarify the behavior

This makes the code more concise and clearer in intent.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat(auth_utils.py): support safety_identifier as a valid way of passing the end user id for responses api

* feat(llms): ensure 'tools' is correctly updated for responses api

* fix: fix greptile feedback

* feat: transformation.py

proper responses api tool handling for guardrail translation layer

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 18:53:59 -08:00
Ishaan Jaff 323aed7211 fix: CI failures - missing env key doc + streaming test (#21510)
* docs: add DATABRICKS_API_KEY to environment settings reference

* fix: streaming test usage check on Pydantic model

* fix: mock litellm.proxy.proxy_server in test_skip_server_startup
2026-02-18 18:20:32 -08:00
Sameer Kankute 69975217d2 Merge pull request #21485 from BerriAI/litellm_fix_Note
Add version in claude-code-beta-headers-incident
2026-02-18 22:55:01 +05:30
Sameer Kankute fef26cfae2 Add version in claude-code-beta-headers-incident 2026-02-18 22:54:27 +05:30