litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-26 13:08:17 +00:00

Author	SHA1	Message	Date
Krrish Dholakia	a26f83fd3c	fix: update calendly on repo	2026-02-23 06:13:59 -08:00
Sameer Kankute	9b5bbee906	Merge pull request #21786 from BerriAI/litellm_oss_staging_02_21_2026 Litellm oss staging 02 21 2026	2026-02-23 18:51:55 +05:30
Sameer Kankute	8decf04d8a	Merge pull request #21877 from BerriAI/litellm_oss_staging_02_22_2026 Litellm oss staging 02 22 2026	2026-02-23 18:50:47 +05:30
Sameer Kankute	37d45139f2	Merge pull request #21917 from BerriAI/litellm_fix_model_cost_map_wildcard Fix: Anthropic model wildcard access issue	2026-02-23 18:45:49 +05:30
TomAlon	99184c48d9	Add Noma guardrails v2 based on custom guardrails (#21400 )	2026-02-23 05:05:27 -08:00
Sameer Kankute	c7aafdf794	Merge pull request #21926 from BerriAI/main merge main in oss 21 02	2026-02-23 18:17:30 +05:30
Sameer Kankute	57af8e6a93	Merge pull request #21924 from BerriAI/main merge main in oss 22 02	2026-02-23 18:11:36 +05:30
Sameer Kankute	eaf3900200	Fix name of title	2026-02-23 17:18:31 +05:30
Sameer Kankute	9b27cd8c0e	Add incident report	2026-02-23 17:13:44 +05:30
Cesar Garcia	b8cef1a4e5	docs: add OpenClaw integration tutorial (#21605 ) * docs: add OpenClaw integration tutorial * docs: simplify OpenClaw proxy start command * docs: rewrite OpenClaw integration guide for clarity - Use gpt-5 as default model - Replace poetry run with standard litellm CLI - Add prerequisites section and verification step - Simplify onboarding instructions (table format) - Move manual config and troubleshooting to bottom - Add multi-model config (claude-sonnet, gemini-flash) * docs: fix model name in OpenClaw manual config example * docs: rewrite OpenClaw integration guide from scratch Rewrote the guide based on hands-on testing of every command. Key changes: - Replace non-existent `openclaw chat` with verified commands (dashboard, tui, agent --agent main) - Add 3 onboarding options: QuickStart, Manual, and non-interactive - Fix health check (requires Bearer token) - Remove misleading "Starting from scratch" section - Use gpt-4o instead of gpt-5 as the example model - Clarify that API keys can come from export, .env, or any method - Add config reference section showing openclaw.json structure - Add real troubleshooting based on issues found during testing	2026-02-21 20:16:27 -08:00
Krish Dholakia	52585eb2d7	Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870 )" (#21876 ) This reverts commit `bce078a796`.	2026-02-21 20:12:01 -08:00
Edwin Isac	bce078a796	fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870 ) * server root path regression doc * fixing syntax * fix: replace Zapier webhook with Google Form for survey submission (#21621) * Replace Zapier webhook with Google Form for survey submission * Add back error logging for survey submission debugging --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth" This reverts commit `0e1db3f7e4`, reversing changes made to `7e2d6f2355`. * test_vertex_ai_gemini_2_5_pro_streaming * UI new build * fix rendering * ui new build * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * release note docs * docs * adding image * fix(vertex_ai): enable context-1m-2025-08-07 beta header The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai, causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`. This prevented using Claude's 1M context window feature via Vertex AI, resulting in `prompt is too long: 460500 tokens > 200000 maximum` errors. Fixes #21861 --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>	2026-02-21 20:11:13 -08:00
LeeJuOh	50f36d9ca6	fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (#21754 ) * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo * fix(budget): update stale docstring on get_budget_reset_time	2026-02-21 19:35:06 -08:00
yuneng-jiang	5bb52d0202	adding image	2026-02-21 18:09:18 -08:00
yuneng-jiang	ea37f59de4	Merge remote-tracking branch 'origin' into litellm_yj_docs_feb21_release	2026-02-21 18:05:11 -08:00
Ishaan Jaffer	84b572d719	docs	2026-02-21 18:00:38 -08:00
yuneng-jiang	5e26891da2	release note docs	2026-02-21 17:58:36 -08:00
Ishaan Jaffer	19f7e881f3	docs fix	2026-02-21 17:53:51 -08:00
Ishaan Jaffer	356eb5a413	docs fix	2026-02-21 17:51:45 -08:00
Ishaan Jaffer	522954fe0d	docs fix	2026-02-21 17:47:44 -08:00
Ishaan Jaffer	45bef9ade8	docs fix	2026-02-21 17:46:01 -08:00
Ishaan Jaffer	5e71f6128b	docs fix	2026-02-21 17:40:39 -08:00
Ishaan Jaffer	e157f5a8f2	docs fix	2026-02-21 17:35:16 -08:00
Ishaan Jaffer	661c6faac6	docs fix	2026-02-21 17:28:04 -08:00
Ishaan Jaffer	efebd37183	docs fix	2026-02-21 17:28:04 -08:00
yuneng-jiang	823bb023df	Merge branch 'main' into litellm_yj_docs_feb21	2026-02-21 17:12:28 -08:00
Ishaan Jaffer	ab032c292c	docs fix	2026-02-21 16:36:22 -08:00
yuneng-jiang	aefc7c14f6	Merge remote-tracking branch 'origin' into doc_yj_feb21	2026-02-21 16:05:07 -08:00
yuneng-jiang	70fd2aa219	fixing syntax	2026-02-21 16:04:42 -08:00
yuneng-jiang	153bf1d856	server root path regression doc	2026-02-21 15:57:06 -08:00
Ishaan Jaffer	775fb79260	fix	2026-02-21 15:45:03 -08:00
Darien Kindlund	ca5c109a92	feat: add optional digest mode for Slack alert types (#21683 ) Adds per-alert-type digest mode that aggregates duplicate alerts within a configurable time window and emits a single summary message with count, start/end timestamps. Configuration via general_settings.alert_type_config: alert_type_config: llm_requests_hanging: digest: true digest_interval: 86400 Digest key: (alert_type, request_model, api_base) Default interval: 24 hours Window type: fixed interval Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-21 15:19:17 -08:00
Ishaan Jaff	6dc9823926	docs(release-notes): update v1.81.14 - split guardrail sections, add eval results, fix key highlights and section placement (#21847 )	2026-02-21 15:18:46 -08:00
Ishaan Jaff	eac3ae8121	docs: update v1.81.14 release notes - guardrail model garden, complexity router placement (#21843 ) * docs(release-notes): update v1.81.14 key highlights and section placement * docs(release-notes): rewrite key highlights and add guardrail narrative section * docs(release-notes): rewrite guardrail narrative to match release notes style * docs(release-notes): add guardrail eval results section	2026-02-21 15:10:21 -08:00
Ishaan Jaff	8c7f667df2	docs: v1.81.14-stable release notes (#21839 ) * docs(release-notes): add v1.81.14-stable release notes * fix(docs): fix MDX compilation errors in auto_routing.md * docs(release-notes): polish v1.81.14 - narrative paragraph, consolidated guardrail templates, merged competitor bullet	2026-02-21 14:50:50 -08:00
shin-bot-litellm	1be30f5129	feat(router): Add complexity-based auto routing strategy (#21789 ) * feat(router): Add complexity-based auto routing strategy Adds a rule-based routing strategy that classifies requests by complexity and routes them to appropriate models - with zero API calls and sub-millisecond latency. ## Features - Zero external API calls - all scoring is local - Sub-millisecond latency - typically <1ms per classification - Weighted multi-dimensional scoring across 7 dimensions: - Token count (short=simple, long=complex) - Code presence (code keywords → complex) - Reasoning markers ("step by step" → reasoning tier) - Technical terms (domain complexity) - Simple indicators ("what is" → simple, negative weight) - Multi-step patterns (numbered steps) - Question complexity (multiple questions) - Configurable tier boundaries and model mappings - Reasoning override - 2+ reasoning markers force REASONING tier ## Usage ```yaml model_list: - model_name: smart-router litellm_params: model: auto_router/complexity_router complexity_router_config: tiers: SIMPLE: gpt-4o-mini MEDIUM: gpt-4o COMPLEX: claude-sonnet-4 REASONING: o1-preview ``` Inspired by ClawRouter: https://github.com/BlockRunAI/ClawRouter ## Files Added - litellm/router_strategy/complexity_router/complexity_router.py - Main router class - litellm/router_strategy/complexity_router/config.py - Configuration and defaults - litellm/router_strategy/complexity_router/__init__.py - Package exports - litellm/router_strategy/complexity_router/README.md - Documentation - tests/test_litellm/router_strategy/test_complexity_router.py - Test suite (37 tests) ## Files Modified - litellm/router.py - Integration with pre_routing_hook - litellm/types/router.py - New config params * feat(router): Add complexity-based auto routing strategy Adds a new rule-based routing strategy that classifies requests by complexity and routes them to appropriate models - without any external API calls. ## Features - Weighted scoring across 7 dimensions: token count, code presence, reasoning markers, technical terms, simple indicators, multi-step patterns, questions - Maps to 4 tiers: SIMPLE, MEDIUM, COMPLEX, REASONING - Each tier configurable to a different model - Zero API calls, <1ms latency - Inspired by ClawRouter ## Configuration ```yaml model_list: - model_name: smart_router litellm_params: model: auto_router/complexity_router complexity_router_config: tiers: SIMPLE: gemini-2.0-flash MEDIUM: gpt-4o-mini COMPLEX: claude-sonnet-4 REASONING: claude-opus-4 ``` ## Use Cases - Cost optimization: route simple queries to cheaper models - Quality optimization: route complex queries to capable models - Zero configuration: works out of the box with sensible defaults * feat(router): Add complexity-based auto routing strategy Adds a new rule-based routing strategy that classifies requests by complexity and routes them to appropriate models - without any external API calls. - Weighted scoring across 7 dimensions: token count, code presence, reasoning markers, technical terms, simple indicators, multi-step patterns, questions - Maps to 4 tiers: SIMPLE, MEDIUM, COMPLEX, REASONING - Each tier configurable to a different model - Zero API calls, <1ms latency - Inspired by ClawRouter ```yaml model_list: - model_name: smart_router litellm_params: model: auto_router/complexity_router complexity_router_config: tiers: SIMPLE: gemini-2.0-flash MEDIUM: gpt-4o-mini COMPLEX: claude-sonnet-4 REASONING: claude-opus-4 ``` - Cost optimization: route simple queries to cheaper models - Quality optimization: route complex queries to capable models - Zero configuration: works out of the box with sensible defaults * feat: add enterprise presets for complexity router Adds preset configurations for different cloud providers: - bedrock: AWS Bedrock (Claude models) - vertex: Google Vertex AI (Gemini models) - azure: Azure OpenAI (GPT + o1) - standard: Direct API (OpenAI + Anthropic) - cost_optimized: Maximum savings (Gemini Flash + cheaper models) Usage: ```yaml complexity_router_config: preset: bedrock # or vertex, azure, standard, cost_optimized ``` * feat(ui): update auto router submit handler for complexity router - Handle complexity_router model type in submit handler - Generate correct litellm_params for complexity router: - model: auto_router/complexity_router - complexity_router_config: { tiers: { SIMPLE, MEDIUM, COMPLEX, REASONING } } - Keep existing semantic router handling intact - Add success notification with router type name * docs: update PR description with UI changes * chore: remove preset feature, keep simple tier config * fix: exclude complexity_router from auto_router check The _is_auto_router_deployment() was matching all auto_router/* models, causing complexity_router to fail initialization. Now it explicitly excludes auto_router/complexity_router which has its own handler. * fix(complexity_router): Address Greptile review feedback Fixes 5 issues flagged in code review: 1. Mutable singleton mutation bug - Now always creates a new ComplexityRouterConfig instance instead of reusing DEFAULT_COMPLEXITY_CONFIG singleton, preventing cross-instance config pollution. 2. Substring matching false positives - Added word boundaries (spaces) to short keywords like 'ok', 'try', 'api', 'git', 'node', 'java', 'vue' to prevent matching within longer words (e.g., 'capital' matching 'api'). 3. Redundant message extraction - Simplified to single reverse loop that extracts both last user message and last system prompt efficiently. 4. Unused imports - Removed unused DEFAULT_CREATIVE_KEYWORDS and DEFAULT_MULTI_STEP_PATTERNS imports. 5. Missing async_pre_routing_hook tests - Added comprehensive tests for: - Multi-turn conversations - List-type content handling - No user message case - Empty string content - Message preservation - Singleton mutation prevention * fix(complexity_router): Address Greptile review feedback - Use word boundary matching for short keywords (<5 chars) to avoid false positives (e.g., 'api' matching 'capital', 'git' matching 'digital') - Remove 'ok' from simple keywords (too many false positives) - Add tests for keyword false positive prevention - Fix test expectations for edge cases (empty string content, list content) Addresses: 2/5 Greptile score feedback on PR #21789 * docs(auto_routing): Add complexity router documentation - Add Complexity Router section to auto_routing.md - Include comparison table with semantic auto router - Add Python SDK and Proxy Server configuration examples - Document all configuration options (tier boundaries, token thresholds, dimension weights) - Explain how complexity scoring works * feat(complexity_router): Add eval suite + tune scoring parameters Added comprehensive evaluation suite with 29 test cases covering: - SIMPLE tier: greetings, definitions, factual questions - MEDIUM tier: technical explanations, comparisons, debugging - COMPLEX tier: architecture design, complex coding - REASONING tier: explicit reasoning requests - Regression tests: substring false positive prevention Tuned scoring parameters based on eval results: - Lowered tier boundaries (0.15/0.35/0.60) for better tier distribution - Increased code/technical weights (0.30/0.25) for complex prompts - Reduced simple indicator weight (0.05) to avoid over-penalizing - Fixed 'hey'/'hi' keywords to require leading space Eval results: 29/29 passed (100%) * fix(complexity_router): Address Greptile review round 2 1. Empty user message handling - Changed from falsy check to None check to properly distinguish 'no user message' from 'empty string message' 2. ReDoS prevention - Changed 'first.then' to 'first.?then' (non-greedy) to prevent regex backtracking on pathological inputs 3. Documentation sync - Updated README.md to match actual config values: - Tier boundaries: 0.15/0.35/0.60 (not 0.25/0.50/0.75) - Dimension weights: tokenCount=0.10, codePresence=0.30, technicalTerms=0.25, simpleIndicators=0.05, multiStepPatterns=0.03, questionComplexity=0.02 4. Missing UI component - Added ComplexityRouterConfig.tsx with: - Tier-to-model dropdown selectors - Descriptions and examples for each tier - How classification works explanation 5. Inline import comment - Added explanation for why ComplexityRouter import is inline (matches AutoRouter pattern, avoids circular imports) * docs(auto_routing): fix dimension weights and tier boundaries to match config.py defaults * fix(complexity_router): skip empty string content in async_pre_routing_hook * fix(router): remove or {} masking None complexity_router_config * fix(config): remove unused DEFAULT_MULTI_STEP_PATTERNS and DEFAULT_CREATIVE_KEYWORDS exports * fix(complexity_router): use word boundary matching for all single-word keywords, avoid double-scanning reasoning keywords * fix(router): clarify circular import comment for ComplexityRouter * docs(README): fix token thresholds to match config.py defaults * test(complexity_router): add false positive tests for error/class/merge keyword matching * fix(complexity_router): align .get() fallbacks with config.py defaults, document system prompt scoring * fix(config): deduplicate keywords across code and technical lists --------- Co-authored-by: OpenClaw Assistant <assistant@openclaw.ai> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>	2026-02-21 13:23:37 -08:00
Ishaan Jaff	d928588de0	docs: mark v1.81.12 as stable (#21809 ) * docs: mark v1.81.12 as stable, point to stable docker image and pip * docs: fix v1.81.12 docker image to point to stable	2026-02-21 12:45:41 -08:00
Ishaan Jaff	2acc5cc457	fix(security): fix CVE-2025-69873, CVE-2026-26996 in docs deps; allowlist nodejs_wheel CVEs in Grype scan (#21787 ) * fix(security): fix CVE-2025-69873 and CVE-2026-26996 in docs dependencies Use npm overrides to pin patched versions: - ajv@6.12.6 → 6.14.0 (fixes ReDoS CVE-2025-69873) - ajv@8.17.1 → 8.18.0 (fixes ReDoS CVE-2025-69873) - minimatch@3.1.2 → 10.2.1 (fixes DoS CVE-2026-26996) serve-handler only calls minimatch(path, pattern) so the 3.x→10.x upgrade is safe. * fix(ruff): add missing Set and Dict imports to fix F821 errors * fix(security): scope ajv overrides to avoid top-level version conflict Replacing global 'ajv: 8.18.0' override with scoped 'schema-utils@4' override. The global override conflicted with the nested file-loader/ null-loader/url-loader overrides, causing npm to install ajv@6 at the top level where ajv-keywords@5.x requires ajv@8 (ajv/dist/compile/codegen). Now: - schema-utils@3 + loaders → ajv@6.14.0 (safe minor bump) - schema-utils@4 → ajv@8.18.0 (safe minor bump) - top-level ajv unmodified (stays at 8.x for ajv-keywords@5) * fix(security): allowlist minimatch and tar CVEs from nodejs_wheel, bump tar override to >=7.5.8	2026-02-21 11:18:52 -08:00
Ishaan Jaff	8a145da793	fix(security): fix CVE-2025-69873 and CVE-2026-26996 in docs dependencies (#21782 ) * fix(security): fix CVE-2025-69873 and CVE-2026-26996 in docs dependencies Use npm overrides to pin patched versions: - ajv@6.12.6 → 6.14.0 (fixes ReDoS CVE-2025-69873) - ajv@8.17.1 → 8.18.0 (fixes ReDoS CVE-2025-69873) - minimatch@3.1.2 → 10.2.1 (fixes DoS CVE-2026-26996) serve-handler only calls minimatch(path, pattern) so the 3.x→10.x upgrade is safe. * fix(ruff): add missing Set and Dict imports to fix F821 errors * fix(security): scope ajv overrides to avoid top-level version conflict Replacing global 'ajv: 8.18.0' override with scoped 'schema-utils@4' override. The global override conflicted with the nested file-loader/ null-loader/url-loader overrides, causing npm to install ajv@6 at the top level where ajv-keywords@5.x requires ajv@8 (ajv/dist/compile/codegen). Now: - schema-utils@3 + loaders → ajv@6.14.0 (safe minor bump) - schema-utils@4 → ajv@8.18.0 (safe minor bump) - top-level ajv unmodified (stays at 8.x for ajv-keywords@5)	2026-02-21 10:56:11 -08:00
Harshit Jain	456d8f5524	feat: add session_id to have better routing	2026-02-21 18:45:50 +05:30
Zhenting Huang	e0aaedc9d1	feat(semantic-cache): support configurable vector dimensions for Qdrant (#21649 ) Add vector_size parameter to QdrantSemanticCache and expose it through the Cache facade as qdrant_semantic_cache_vector_size. This allows users to use embedding models with dimensions other than the default 1536, enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d), voyage, cohere, etc. The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for backward compatibility. When creating new collections, the configured vector_size is used instead of the hardcoded constant. Closes #9377	2026-02-21 00:51:15 -08:00
Harshit Jain	80c3b236e2	Update docs/my-website/docs/troubleshoot/rollback.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-02-21 10:40:41 +05:30
Harshit Jain	5916cf15ad	Update docs/my-website/docs/troubleshoot/rollback.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-02-21 10:40:31 +05:30
Harshit Jain	ece5b8c565	doc: add rollback safety check	2026-02-21 10:33:17 +05:30
yuneng-jiang	65dc7556a8	[Fix] Fix web search model info regression, deprecated prompt caching model, undocumented env keys - Revert test_anthropic_web_search_in_model_info to use claude-3-5-haiku-latest (model info test doesn't make API calls, so the -latest alias is fine here) - Replace claude-3-7-sonnet-20250219 with claude-sonnet-4-5-20250929 in test_anthropic_prompt_caching.py (10 instances) - Include pending doc updates for COMPETITOR_LLM_TEMPERATURE and MAX_COMPETITOR_NAMES env vars in config_settings.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-20 17:26:58 -08:00
Sameer Kankute	871c049f49	Add support for reasoning and tools viaconfig	2026-02-20 16:22:19 +05:30
Ishaan Jaff	18f8a2cee3	docs: add latency overhead troubleshooting guide (#21603 ) * add latency overhead troubleshooting doc * add latency_overhead to troubleshooting sidebar * docs: add x-litellm-overhead-duration-ms to latency troubleshooting guide	2026-02-19 12:42:33 -08:00
Ishaan Jaff	2c8fcf854a	docs: add latency overhead troubleshooting guide (#21600 ) * add latency overhead troubleshooting doc * add latency_overhead to troubleshooting sidebar	2026-02-19 12:34:23 -08:00
Sameer Kankute	4d392cacb8	Fix release	2026-02-20 00:27:12 +05:30
Sameer Kankute	c123dc5c24	Fix vercel build	2026-02-19 22:19:34 +05:30

1 2 3 4 5 ...

5631 Commits