litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-17 20:48:32 +00:00

Author	SHA1	Message	Date
ishaan-berri	d03c301c79	Merge pull request #25936 from BerriAI/litellm_health-check-reasoning-tokens fix(proxy): prioritize reasoning health-check max token precedence	2026-04-18 11:35:04 -07:00
Yuneng Jiang	e004876950	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/wonderful-bouman # Conflicts: # tests/test_litellm/proxy/ui_crud_endpoints/test_proxy_setting_endpoints.py	2026-04-17 21:32:09 -07:00
ishaan-berri	1c128a86b8	Merge pull request #25256 from BerriAI/litellm_ishaan_april6 Litellm ishaan april6	2026-04-17 16:26:45 -07:00
Yuneng Jiang	1e25a00e5d	[Docs] BYOK tutorial: document the UI-only configuration path	2026-04-17 13:32:17 -07:00
Krrish Dholakia	dd76cc5d9d	docs: add "Copy Page as Markdown" + llms.txt to docs site (#25975 ) * docs: add copy-page-as-markdown button + llms.txt generation Adds the signalwire llms-txt Docusaurus plugin + theme so every docs page gets: - A "Copy Page" dropdown in the breadcrumbs (Copy, View Markdown, Ask ChatGPT, Ask Claude) — defaults from the theme hook, no extra config required. - A raw `.md` companion at `<page>.md` for LLM consumption. - Site-wide `/llms.txt` index and `/llms-full.txt` corpus. The signalwire plugin README documents a `copyPageButton` option that the v1.2.2 Joi schema actually rejects; the theme's defaults cover the same feature set, so only `content.enableMarkdownFiles` and `enableLlmsFullTxt` are set. Theme is pinned to `1.0.0-alpha.9` because the floating version resolves to a broken canary whose `main` points at a missing file. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * docs: pin exact versions for signalwire llms-txt deps Drop the caret ranges on the two packages added in the prior commit so the docs site pulls byte-identical npm tarballs on every install. Matches the existing convention in this package.json (everything else is already exact) and protects against supply-chain substitution if a malicious patch version is published under the same minor. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * docs: upgrade signalwire llms-txt plugin to v2 alpha + enable copy button The stable v1.2.2 plugin we first pinned does not call setGlobalData during contentLoaded, so the theme's CopyPageContent component always returned null (its `!siteConfig` bailout). The theme v1.0.0-alpha.9 is built against the v2-alpha plugin API, which is the version that actually wires the copy-content JSON and plugin config into the theme via setGlobalData. Pins plugin to 2.0.0-alpha.7 (exact, no caret) and switches the config to the v2 schema: - top-level `markdown` + `llmsTxt` replace the v1 `content` block - new `ui.copyPageContent` (off by default in v2) enables the button with view-markdown + ChatGPT + Claude actions. Verified end-to-end: production build serves the dropdown with "Copy Raw Markdown", "View Markdown", "Reference in ChatGPT", and "Reference in Claude" on /docs/routing (button mounts at ~x=960 in the breadcrumbs row). Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> --------- Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>	2026-04-17 13:03:12 -07:00
Ishaan Jaffer	f31d4faa87	Merge origin/main into litellm_ishaan_april6	2026-04-17 12:36:51 -07:00
Sameer Kankute	d86c6a5b2f	fix(proxy): prioritize reasoning health check token defaults Apply reasoning-first precedence for background health-check max tokens, parse reasoning env as optional, and raise non-wildcard fallback max_tokens from 1 to 5 for better reliability. Made-with: Cursor	2026-04-17 12:36:58 +05:30
Stefano Romanò	f69b9d6564	Add capability to override default GitHub Copilot authentication endp… (#25915 ) * Add capability to override default GitHub Copilot authentication endpoints This feature adds support for GitHub Enterprise subsriptions with custom domain/data ownership (which use a different URL compared to standard accounts) * Update documentation with new parameters * Move access token URL and Client ID retrieval outside for loop Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Fix spurious comment from Greptile review * Align api_base retrieval behavior across chat and embedding transformations * Add missing GitHub Copilot client ID parameter in docs * Update website documentation with newer options for GitHub Enterprise Copilot * Fix default value for Copilot client ID in docs Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-04-16 21:04:38 -07:00
Krrish Dholakia	13108f39cb	Add docs announcement bar for Trivy compromise resolution (#25870 ) * Add announcement bar for Trivy compromise resolution notice Add a Docusaurus announcement bar to the top of the docs site informing users that the Trivy supply-chain compromise has been mitigated and resolved. The banner: - States all affected packages have been deleted and releases are safe - Links to the Security Townhall blog post for details - Links to the CI/CD v2 blog post for improvements made - Uses a green background with closeable dismiss button Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com> * Use :::note admonition instead of announcement bar Replace the Docusaurus announcementBar with a :::note admonition on the docs index page. The note appears below the hero image with the title 'Security Update' and links to the Security Townhall and CI/CD v2 blog posts. Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com> * Update security notice wording to 'contained' Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com> * Move note above hero image and add to root page - Move the security notice above the product screenshot on /docs - Add the same notice to the root page (src/pages/index.md) Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com> * Update security notice wording Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>	2026-04-16 15:15:52 -07:00
Sameer Kankute	13522ff33a	Fix version in docs	2026-04-16 22:41:32 +05:30
ishaan-berri	44c992416c	Merge pull request #25867 from BerriAI/litellm_day_0_opus_4.7_support Litellm day 0 opus 4.7 support	2026-04-16 09:42:11 -07:00
Sameer Kankute	07d863b8e7	Remove max support for opus 4.7	2026-04-16 21:58:03 +05:30
Sameer Kankute	f94c8dda82	Fix model names	2026-04-16 21:47:58 +05:30
Sameer Kankute	b3d5ff5774	Fix tests + add docs	2026-04-16 21:45:31 +05:30
Sameer Kankute	4b5c86b8a1	Fix code qa	2026-04-16 19:29:08 +05:30
ryan-crabbe-berri	2dd060b4e4	Merge pull request #25838 from BerriAI/litellm_fix-virtual-key-projected-spend-alert fix(proxy): fix virtual key projected-spend soft budget alerts	2026-04-15 22:22:33 -07:00
Ryan Crabbe	f639769ca9	fix(proxy): use flat soft_budget field for virtual key projected-spend alerts The projected-spend alert in _update_key_cache read from existing_spend_obj.litellm_budget_table["soft_budget"], but the nested dict is never populated for virtual keys (the combined_view SQL maps budget fields to flat top-level attributes instead). This made the check dead code — it silently short-circuited on every request, and when unblocked, crashed update_cache with a Pydantic ValidationError because _get_projected_spend_over_limit returns a date object but CallInfo.projected_exceeded_date expects str. Fixes: read from the flat existing_spend_obj.soft_budget field that IS populated, and stringify projected_exceeded_date. Also marks team soft budget email alerts as enterprise in docs. Closes #20324	2026-04-15 21:38:18 -07:00
Ishaan Jaffer	def9c4ec47	chore: merge litellm_internal_staging, resolve uv.lock conflict	2026-04-15 18:51:19 -07:00
ishaan-berri	ae2aba0e15	Merge pull request #25622 from Sameerlite/litellm_docs_cost_discrepancy_guide docs(troubleshoot): cost discrepancy debugging guide	2026-04-15 18:43:15 -07:00
Ishaan Jaffer	9977e63e3c	Merge remote-tracking branch 'origin/main' into worktree-foamy-jumping-coral	2026-04-15 18:29:55 -07:00
Sameer Kankute	3fdd67ff23	Delete docs/my-website/blog/debug_cost_discrepancy/index.md	2026-04-15 21:35:05 +05:30
Yuneng Jiang	6426bc41f5	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_apr14	2026-04-14 22:40:04 -07:00
yuneng-jiang	50786007fc	Merge pull request #25736 from BerriAI/docs_visual_guide_for_guardrail_fallbacks docs update	2026-04-14 19:51:54 -07:00
shivam	fd110cd5cf	docs update	2026-04-14 18:33:42 -07:00
yuneng-jiang	5c1f7d99bf	Merge pull request #25731 from BerriAI/docs_guardrail fallbacks image	2026-04-14 18:13:12 -07:00
shivam	65ce89dc67	update	2026-04-14 18:02:41 -07:00
shivam	19629004f5	fallbacks image	2026-04-14 17:58:11 -07:00
Yuneng Jiang	05ad48236f	[Docs] Regenerate v1.83.3-stable release notes from v1.82.3-stable baseline The previous v1.83.3 changelog was generated against v1.83.0-nightly and missed ~3 weeks of work. This regenerates it against the previous stable release and restructures the LLM API Endpoints section to group by API type (Responses, Batch, Count Tokens, Video Generation, Pass-Through, etc.) matching the convention used in v1.82.3, v1.82.0, and v1.81.14. Adds ~25 previously uncited PRs, cross-section duplications for cross-cutting changes, and a verified first-time-contributors list.	2026-04-14 17:19:42 -07:00
Ryan Crabbe	3aae15f5d8	[Docs] Use GitHub avatar for Ryan Crabbe in release notes Replace the expiring LinkedIn CDN image URL with a stable GitHub avatar URL for v1.83.3 and v1.83.7.rc.1 release notes.	2026-04-14 16:22:07 -07:00
Yuneng Jiang	966be2982a	[Docs] Add missed content PRs to v1.83.7.rc.1 and update runbook - Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution) - Add @kedarthakkar to new contributors (first-ever PR via #23769) - Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor	2026-04-14 16:13:09 -07:00
Yuneng Jiang	4a1da629fa	[Fix] Correct pip install versions for v1.83.3-stable and v1.83.7.rc.1 docs PyPI publishes 1.83.3 and 1.83.7 (no .post1 / rc1 suffixes) — align the pip install commands with the actual published versions.	2026-04-14 16:00:27 -07:00
Yuneng Jiang	8eec2c69b7	[Docs] Add release notes for v1.83.3-stable and v1.83.7.rc.1 - Retitle existing v1.83.3 preview file to v1.83.3-stable (same commit) - Add new v1.83.7.rc.1 preview release notes - Update RELEASE_NOTES_GENERATION_INSTRUCTIONS runbook with guidance on resolving staging PRs to their underlying commits	2026-04-14 15:58:13 -07:00
ishaan-berri	0e43050a01	Merge pull request #25650 from BerriAI/litellm_dev_04_13_2026_p1 feat: add litellm.compress() — BM25-based prompt compression with ret…	2026-04-14 12:24:47 -07:00
Sameer Kankute	1a9a31e4a2	Merge pull request #25665 from BerriAI/litellm_oss_staging_04_13_2026_p1 litellm oss staging 04/13/2026	2026-04-14 23:50:08 +05:30
Jonas Neubert	e724e5e07d	add NO_OPENAPI env var to disable /openapi.json endpoint (#25547 )	2026-04-14 23:37:49 +05:30
Ashton Sidhu	6343148c95	Hiddenlayer Integration: Add V2 Integration (#22708 ) * Serialize error message to a string; only scan last message * Update litellm/proxy/guardrails/guardrail_hooks/hiddenlayer/hiddenlayer.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add v2 of hiddenlayer guardrail implementation * Update litellm/proxy/guardrails/guardrail_hooks/hiddenlayer/hiddenlayer.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Fix potential header issue * linting * Add image support --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-04-14 23:37:49 +05:30
ishaan-berri	4a71583951	Merge pull request #25348 from BerriAI/litellm_gemini-veo-video-resolution-pricing2 feat(gemini): Veo Lite pricing, video resolution usage and tiered cost	2026-04-14 10:23:22 -07:00
ishaan-berri	9810a1b3b7	Merge pull request #25344 from BerriAI/litellm_Sameerlite/healthcheck-max-tokens feat(health-check): add BACKGROUND_HEALTH_CHECK_MAX_TOKENS env var	2026-04-14 10:04:50 -07:00
yuneng-jiang	b0a40fde6d	Merge pull request #25559 from shreyescodes/fix/cors-and-db-safety-bugs fix: harden CORS credentials, create_views exception handling, and spend log cleanup loop	2026-04-13 21:17:12 -07:00
yuneng-jiang	8427534f13	Merge pull request #25647 from BerriAI/litellm_yj_apr_11 [Infra] Merge dev branch with main	2026-04-13 17:28:38 -07:00
yuneng-jiang	a306092d47	Merge pull request #25463 from BerriAI/litellm_oss_staging_04_09_2026 Litellm oss staging 04 09 2026	2026-04-13 17:25:53 -07:00
ishaan-berri	548225ef31	Merge pull request #25586 from BerriAI/litellm_ishaan_april11 Litellm ishaan april11	2026-04-13 14:55:50 -07:00
Krrish Dholakia	26c7412339	feat: add litellm.compress() — BM25-based prompt compression with retrieval tool (#25637 ) * feat: add litellm.compress() for BM25-based context compression Adds a compress() utility that reduces context size for LLM calls using BM25 relevance scoring (with optional semantic embeddings via litellm.embedding()). Messages below a token threshold pass through unchanged; messages above are scored, ranked, and the lowest-relevance ones replaced with stubs. Originals are cached and a retrieval tool is injected so the model can recover dropped content on demand. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(compress): truncate high-scoring messages instead of fully stubbing them When a relevant message was too large to fit in the token budget it was replaced with a stub, leaving the LLM with no real content to work with. Now the highest-scoring overflow message is truncated (first 70% + last 30% of words) to fill the remaining budget, so the LLM always receives actual content rather than just a retrieval pointer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bm25): add prefix expansion so query terms match inflected doc tokens "cook" now matches "cooking", "auth" matches "authentication", etc. Without this, short query terms scored 0 against longer inflected forms in documents, causing the wrong message to be kept. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add routing correctness test and eval harness for litellm.compress() - test_simple_compression: parametrized test verifying BM25 routes the right message based on query ("How to cook?" keeps cooking, "Fix auth" keeps auth content) - eval_compression.py: end-to-end eval harness comparing baseline vs compressed model performance on HumanEval-style coding problems Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(eval): add SWE-bench Lite compression eval harness Uses princeton-nlp/SWE-bench_Lite_bm25_27K which bundles ~27k tokens of BM25-retrieved repo context per problem — large enough to meaningfully stress litellm.compress() without Docker or GitHub API calls. Proxy eval metrics (no test runner needed): - has_diff: model produced a valid unified diff - file_overlap: fraction of gold-patch files in generated patch - exact_file_match: generated patch touches exactly the right files Run: python tests/eval_swe_bench.py --model gpt-4o --problems 10 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(eval): robust dataset loading + sys.path fix for worktree imports - Add HuggingFace API fallback so the SWE-bench loader doesn't need the `datasets` library (avoids pyarrow/numpy binary compat issues) - Insert repo root into sys.path so compression module resolves from worktrees - Use direct import of litellm_compress to avoid __getattr__ issues Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * improve compression quality: line-based truncation, multi-message budget, 70% default target - Switch truncate_message from word-based to line-based splitting to preserve code structure (function boundaries, indentation) - Allow multiple messages to be truncated instead of burning entire budget on one overflow message - Raise default compression target from 50% to 70% of trigger for better quality/cost tradeoff - Add --compression-target CLI arg to SWE-bench eval harness - Move tests to canonical locations (tests/test_litellm/, scripts/) - Add docs page and sidebar entries for compress() Eval results (5 problems, Opus, trigger=10k): Hunk overlap delta improved from -0.417 to -0.221 Content similarity now matches baseline (+0.006) Cost savings: 72% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add SWE-bench performance results to compress() docs Include benchmark table from Opus eval (5 problems, trigger=10k) showing 72% cost savings with file-level quality fully preserved. Add metric explanations and eval runner examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(eval): use tolerance-based hunk overlap metric The exact line-number matching was too brittle — LLM-generated patches often target the right code region but with slightly offset line numbers. Switch to hunk-level overlap with a 10-line tolerance window so nearby edits count as matches. This better reflects actual patch quality. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add compression_interception callback for LiteLLM Proxy Add a proxy callback that automatically compresses incoming /v1/messages payloads above a configurable token threshold, runs the retrieval tool loop server-side, and returns the final response. This brings compress() support to proxy deployments (e.g. Claude Code via /v1/messages). - New callback: litellm/integrations/compression_interception/ - Proxy config: compression_interception_params in litellm_settings - Support for input_type param in compress() (openai vs anthropic) - Docs: proxy setup instructions with YAML config example - Tests: 139-line unit test suite for the interception handler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "feat: add compression_interception callback for LiteLLM Proxy" This reverts commit 72bd5cb152ca1df07f14a14e14a2816e188874a8. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 12:23:54 -07:00
Krrish Dholakia	d319cd8cc6	fix: blog dark mode - text invisible on dark background (#25620 ) The blog CSS selectors for dark mode used descendant selectors like [data-theme='dark'] .blog-wrapper which never matched because both data-theme and .blog-wrapper are applied to the same <html> element by Docusaurus. Fixed by using compound selectors (no space): [data-theme='dark'].blog-wrapper. Also added missing dark-mode overrides for: - pre/code blocks in blog posts - link colors in blog posts - marquee items, separators, and labels on blog list page - pagination links on blog list page - meta text and author separators on blog list page Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>	2026-04-13 09:08:57 -07:00
Sameer Kankute	639135e365	Update docs/my-website/blog/debug_cost_discrepancy/index.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-04-13 11:33:24 +05:30
Sameer Kankute	5e830e0d55	docs(troubleshoot): add cost discrepancy debugging guide - New troubleshoot page and blog post with step-by-step comparison workflow - Screenshots under static/img/cost-discrepancy-debug - Link from spend tracking; sidebar entry under Troubleshooting - Flowchart SVG: Path B connectors below box; clarify LiteLLM schedules customer calls when stuck Made-with: Cursor	2026-04-13 11:27:16 +05:30
Sameer Kankute	fa605d85c0	Merge pull request #25616 from BerriAI/main merge main	2026-04-13 08:43:43 +05:30
Yuneng Jiang	41849a540d	document new env var and fix type hint - Add LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS to the environment variables reference so the documentation test passes. - Annotate the values variable in _reject_os_environ_references so it accepts both dict.values() and list iterables.	2026-04-11 22:17:32 -07:00
Yuneng Jiang	6baee0dfcb	address review feedback - Log a warning when dropping callback params that carry os.environ/ references so operators notice the misconfiguration. - Require absolute paths in oidc/file/ and correct the documented example to use the leading-slash form. - Drop the unused return value from _reject_os_environ_references.	2026-04-11 21:52:39 -07:00
Yuneng Jiang	06a0d4498a	fix: tighten handling of environment references in request parameters - Reject os.environ/ references supplied via /health/test_connection request params instead of resolving them; config-sourced values are already resolved before reaching the endpoint. - Skip os.environ/ references in dynamic callback params loaded from per-request metadata. - Constrain oidc/file/ to an allowed credential directory allowlist (defaults to /var/run/secrets and /run/secrets, overridable via LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS).	2026-04-11 21:41:41 -07:00

1 2 3 4 5 ...

6143 Commits