Commit Graph

6127 Commits

Author SHA1 Message Date
Sameer Kankute 4b5c86b8a1 Fix code qa 2026-04-16 19:29:08 +05:30
ryan-crabbe-berri 2dd060b4e4 Merge pull request #25838 from BerriAI/litellm_fix-virtual-key-projected-spend-alert
fix(proxy): fix virtual key projected-spend soft budget alerts
2026-04-15 22:22:33 -07:00
Ryan Crabbe f639769ca9 fix(proxy): use flat soft_budget field for virtual key projected-spend alerts
The projected-spend alert in _update_key_cache read from
existing_spend_obj.litellm_budget_table["soft_budget"], but the nested
dict is never populated for virtual keys (the combined_view SQL maps
budget fields to flat top-level attributes instead). This made the
check dead code — it silently short-circuited on every request, and
when unblocked, crashed update_cache with a Pydantic ValidationError
because _get_projected_spend_over_limit returns a date object but
CallInfo.projected_exceeded_date expects str.

Fixes: read from the flat existing_spend_obj.soft_budget field that IS
populated, and stringify projected_exceeded_date.

Also marks team soft budget email alerts as enterprise in docs.

Closes #20324
2026-04-15 21:38:18 -07:00
Ishaan Jaffer def9c4ec47 chore: merge litellm_internal_staging, resolve uv.lock conflict 2026-04-15 18:51:19 -07:00
ishaan-berri ae2aba0e15 Merge pull request #25622 from Sameerlite/litellm_docs_cost_discrepancy_guide
docs(troubleshoot): cost discrepancy debugging guide
2026-04-15 18:43:15 -07:00
Ishaan Jaffer 9977e63e3c Merge remote-tracking branch 'origin/main' into worktree-foamy-jumping-coral 2026-04-15 18:29:55 -07:00
Sameer Kankute 3fdd67ff23 Delete docs/my-website/blog/debug_cost_discrepancy/index.md 2026-04-15 21:35:05 +05:30
Yuneng Jiang 6426bc41f5 Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_apr14 2026-04-14 22:40:04 -07:00
yuneng-jiang 50786007fc Merge pull request #25736 from BerriAI/docs_visual_guide_for_guardrail_fallbacks
docs update
2026-04-14 19:51:54 -07:00
shivam fd110cd5cf docs update 2026-04-14 18:33:42 -07:00
yuneng-jiang 5c1f7d99bf Merge pull request #25731 from BerriAI/docs_guardrail
fallbacks image
2026-04-14 18:13:12 -07:00
shivam 65ce89dc67 update 2026-04-14 18:02:41 -07:00
shivam 19629004f5 fallbacks image 2026-04-14 17:58:11 -07:00
Yuneng Jiang 05ad48236f [Docs] Regenerate v1.83.3-stable release notes from v1.82.3-stable baseline
The previous v1.83.3 changelog was generated against v1.83.0-nightly and
missed ~3 weeks of work. This regenerates it against the previous stable
release and restructures the LLM API Endpoints section to group by API
type (Responses, Batch, Count Tokens, Video Generation, Pass-Through,
etc.) matching the convention used in v1.82.3, v1.82.0, and v1.81.14.
Adds ~25 previously uncited PRs, cross-section duplications for
cross-cutting changes, and a verified first-time-contributors list.
2026-04-14 17:19:42 -07:00
Ryan Crabbe 3aae15f5d8 [Docs] Use GitHub avatar for Ryan Crabbe in release notes
Replace the expiring LinkedIn CDN image URL with a stable GitHub
avatar URL for v1.83.3 and v1.83.7.rc.1 release notes.
2026-04-14 16:22:07 -07:00
Yuneng Jiang 966be2982a [Docs] Add missed content PRs to v1.83.7.rc.1 and update runbook
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution)
- Add @kedarthakkar to new contributors (first-ever PR via #23769)
- Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
2026-04-14 16:13:09 -07:00
Yuneng Jiang 4a1da629fa [Fix] Correct pip install versions for v1.83.3-stable and v1.83.7.rc.1 docs
PyPI publishes 1.83.3 and 1.83.7 (no .post1 / rc1 suffixes) — align the pip install commands with the actual published versions.
2026-04-14 16:00:27 -07:00
Yuneng Jiang 8eec2c69b7 [Docs] Add release notes for v1.83.3-stable and v1.83.7.rc.1
- Retitle existing v1.83.3 preview file to v1.83.3-stable (same commit)
- Add new v1.83.7.rc.1 preview release notes
- Update RELEASE_NOTES_GENERATION_INSTRUCTIONS runbook with guidance on resolving staging PRs to their underlying commits
2026-04-14 15:58:13 -07:00
ishaan-berri 0e43050a01 Merge pull request #25650 from BerriAI/litellm_dev_04_13_2026_p1
feat: add litellm.compress() — BM25-based prompt compression with ret…
2026-04-14 12:24:47 -07:00
Sameer Kankute 1a9a31e4a2 Merge pull request #25665 from BerriAI/litellm_oss_staging_04_13_2026_p1
litellm oss staging 04/13/2026
2026-04-14 23:50:08 +05:30
Jonas Neubert e724e5e07d add NO_OPENAPI env var to disable /openapi.json endpoint (#25547) 2026-04-14 23:37:49 +05:30
Ashton Sidhu 6343148c95 Hiddenlayer Integration: Add V2 Integration (#22708)
* Serialize error message to a string; only scan last message

* Update litellm/proxy/guardrails/guardrail_hooks/hiddenlayer/hiddenlayer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add v2 of hiddenlayer guardrail implementation

* Update litellm/proxy/guardrails/guardrail_hooks/hiddenlayer/hiddenlayer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix potential header issue

* linting

* Add image support

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-14 23:37:49 +05:30
ishaan-berri 4a71583951 Merge pull request #25348 from BerriAI/litellm_gemini-veo-video-resolution-pricing2
feat(gemini): Veo Lite pricing, video resolution usage and tiered cost
2026-04-14 10:23:22 -07:00
ishaan-berri 9810a1b3b7 Merge pull request #25344 from BerriAI/litellm_Sameerlite/healthcheck-max-tokens
feat(health-check): add BACKGROUND_HEALTH_CHECK_MAX_TOKENS env var
2026-04-14 10:04:50 -07:00
yuneng-jiang b0a40fde6d Merge pull request #25559 from shreyescodes/fix/cors-and-db-safety-bugs
fix: harden CORS credentials, create_views exception handling, and spend log cleanup loop
2026-04-13 21:17:12 -07:00
yuneng-jiang 8427534f13 Merge pull request #25647 from BerriAI/litellm_yj_apr_11
[Infra] Merge dev branch with main
2026-04-13 17:28:38 -07:00
yuneng-jiang a306092d47 Merge pull request #25463 from BerriAI/litellm_oss_staging_04_09_2026
Litellm oss staging 04 09 2026
2026-04-13 17:25:53 -07:00
ishaan-berri 548225ef31 Merge pull request #25586 from BerriAI/litellm_ishaan_april11
Litellm ishaan april11
2026-04-13 14:55:50 -07:00
Krrish Dholakia 26c7412339 feat: add litellm.compress() — BM25-based prompt compression with retrieval tool (#25637)
* feat: add litellm.compress() for BM25-based context compression

Adds a compress() utility that reduces context size for LLM calls using
BM25 relevance scoring (with optional semantic embeddings via
litellm.embedding()). Messages below a token threshold pass through
unchanged; messages above are scored, ranked, and the lowest-relevance
ones replaced with stubs. Originals are cached and a retrieval tool is
injected so the model can recover dropped content on demand.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(compress): truncate high-scoring messages instead of fully stubbing them

When a relevant message was too large to fit in the token budget it was
replaced with a stub, leaving the LLM with no real content to work with.
Now the highest-scoring overflow message is truncated (first 70% + last 30%
of words) to fill the remaining budget, so the LLM always receives actual
content rather than just a retrieval pointer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bm25): add prefix expansion so query terms match inflected doc tokens

"cook" now matches "cooking", "auth" matches "authentication", etc.
Without this, short query terms scored 0 against longer inflected forms
in documents, causing the wrong message to be kept.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add routing correctness test and eval harness for litellm.compress()

- test_simple_compression: parametrized test verifying BM25 routes the
  right message based on query ("How to cook?" keeps cooking, "Fix auth"
  keeps auth content)
- eval_compression.py: end-to-end eval harness comparing baseline vs
  compressed model performance on HumanEval-style coding problems

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(eval): add SWE-bench Lite compression eval harness

Uses princeton-nlp/SWE-bench_Lite_bm25_27K which bundles ~27k tokens of
BM25-retrieved repo context per problem — large enough to meaningfully
stress litellm.compress() without Docker or GitHub API calls.

Proxy eval metrics (no test runner needed):
  - has_diff: model produced a valid unified diff
  - file_overlap: fraction of gold-patch files in generated patch
  - exact_file_match: generated patch touches exactly the right files

Run: python tests/eval_swe_bench.py --model gpt-4o --problems 10

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(eval): robust dataset loading + sys.path fix for worktree imports

- Add HuggingFace API fallback so the SWE-bench loader doesn't need
  the `datasets` library (avoids pyarrow/numpy binary compat issues)
- Insert repo root into sys.path so compression module resolves
  from worktrees
- Use direct import of litellm_compress to avoid __getattr__ issues

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* improve compression quality: line-based truncation, multi-message budget, 70% default target

- Switch truncate_message from word-based to line-based splitting to
  preserve code structure (function boundaries, indentation)
- Allow multiple messages to be truncated instead of burning entire
  budget on one overflow message
- Raise default compression target from 50% to 70% of trigger for
  better quality/cost tradeoff
- Add --compression-target CLI arg to SWE-bench eval harness
- Move tests to canonical locations (tests/test_litellm/, scripts/)
- Add docs page and sidebar entries for compress()

Eval results (5 problems, Opus, trigger=10k):
  Hunk overlap delta improved from -0.417 to -0.221
  Content similarity now matches baseline (+0.006)
  Cost savings: 72%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add SWE-bench performance results to compress() docs

Include benchmark table from Opus eval (5 problems, trigger=10k)
showing 72% cost savings with file-level quality fully preserved.
Add metric explanations and eval runner examples.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(eval): use tolerance-based hunk overlap metric

The exact line-number matching was too brittle — LLM-generated patches
often target the right code region but with slightly offset line numbers.
Switch to hunk-level overlap with a 10-line tolerance window so nearby
edits count as matches. This better reflects actual patch quality.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add compression_interception callback for LiteLLM Proxy

Add a proxy callback that automatically compresses incoming /v1/messages
payloads above a configurable token threshold, runs the retrieval tool
loop server-side, and returns the final response. This brings compress()
support to proxy deployments (e.g. Claude Code via /v1/messages).

- New callback: litellm/integrations/compression_interception/
- Proxy config: compression_interception_params in litellm_settings
- Support for input_type param in compress() (openai vs anthropic)
- Docs: proxy setup instructions with YAML config example
- Tests: 139-line unit test suite for the interception handler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "feat: add compression_interception callback for LiteLLM Proxy"

This reverts commit 72bd5cb152ca1df07f14a14e14a2816e188874a8.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 12:23:54 -07:00
Krrish Dholakia d319cd8cc6 fix: blog dark mode - text invisible on dark background (#25620)
The blog CSS selectors for dark mode used descendant selectors like
[data-theme='dark'] .blog-wrapper which never matched because both
data-theme and .blog-wrapper are applied to the same <html> element
by Docusaurus. Fixed by using compound selectors (no space):
[data-theme='dark'].blog-wrapper.

Also added missing dark-mode overrides for:
- pre/code blocks in blog posts
- link colors in blog posts
- marquee items, separators, and labels on blog list page
- pagination links on blog list page
- meta text and author separators on blog list page

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-04-13 09:08:57 -07:00
Sameer Kankute 639135e365 Update docs/my-website/blog/debug_cost_discrepancy/index.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-13 11:33:24 +05:30
Sameer Kankute 5e830e0d55 docs(troubleshoot): add cost discrepancy debugging guide
- New troubleshoot page and blog post with step-by-step comparison workflow
- Screenshots under static/img/cost-discrepancy-debug
- Link from spend tracking; sidebar entry under Troubleshooting
- Flowchart SVG: Path B connectors below box; clarify LiteLLM schedules customer calls when stuck

Made-with: Cursor
2026-04-13 11:27:16 +05:30
Sameer Kankute fa605d85c0 Merge pull request #25616 from BerriAI/main
merge main
2026-04-13 08:43:43 +05:30
Yuneng Jiang 41849a540d document new env var and fix type hint
- Add LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS to the environment variables
  reference so the documentation test passes.
- Annotate the values variable in _reject_os_environ_references so it
  accepts both dict.values() and list iterables.
2026-04-11 22:17:32 -07:00
Yuneng Jiang 6baee0dfcb address review feedback
- Log a warning when dropping callback params that carry os.environ/
  references so operators notice the misconfiguration.
- Require absolute paths in oidc/file/ and correct the documented
  example to use the leading-slash form.
- Drop the unused return value from _reject_os_environ_references.
2026-04-11 21:52:39 -07:00
Yuneng Jiang 06a0d4498a fix: tighten handling of environment references in request parameters
- Reject os.environ/ references supplied via /health/test_connection
  request params instead of resolving them; config-sourced values are
  already resolved before reaching the endpoint.
- Skip os.environ/ references in dynamic callback params loaded from
  per-request metadata.
- Constrain oidc/file/ to an allowed credential directory allowlist
  (defaults to /var/run/secrets and /run/secrets, overridable via
  LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS).
2026-04-11 21:41:41 -07:00
ishaan-berri fdd7500904 blog: add back arrow to blog post pages (#25587)
* blog: add back arrow to post pages

* blog: style back arrow — fixed top-left below navbar
2026-04-11 19:15:45 -07:00
ishaan-berri 1edf41c26f Merge pull request #25585 from BerriAI/litellm_dev_04_11_2026_p1
Litellm dev 04 11 2026 p1
2026-04-11 18:46:57 -07:00
ishaan-berri 329a526b9d Merge pull request #25579 from BerriAI/feat/anthropic-advisor-tool
feat(advisor): advisor tool orchestration loop for non-Anthropic providers
2026-04-11 18:32:44 -07:00
Ishaan Jaffer dd87f3be5b docs(advisor): move supported providers to top, focus how it works on litellm native loop 2026-04-11 18:27:18 -07:00
Ishaan Jaffer a8bc7bfcd4 docs(advisor): add how it works section with mermaid diagram + non-native provider table 2026-04-11 18:23:33 -07:00
Ishaan Jaffer 35f4b47ff8 apply content guidelines: scale/resilience narrative, FAQ, Key Takeaways, Conclusion CTA 2026-04-11 18:12:32 -07:00
Ishaan Jaffer 14eed24471 add redis circuit breaker blog post with React diagrams 2026-04-11 18:02:59 -07:00
Ishaan Jaffer 8e616ecdf4 add BlogPostPage swizzle: hide sidebar, add hiring CTA on every post 2026-04-11 18:02:56 -07:00
Ishaan Jaffer dac44fb443 blog list styles: clean typography, marquee animation, hero layout 2026-04-11 18:02:52 -07:00
Ishaan Jaffer 85cb7db8b9 blog list page: Ramp-style flat list with hero, provider marquee, hiring CTA 2026-04-11 18:02:48 -07:00
Ishaan Jaffer 05d516482f restyle blog list page to match engineering blog aesthetic 2026-04-11 18:02:44 -07:00
Krrish Dholakia e08e3bf748 docs: clarify how to get benchmarking script 2026-04-11 17:31:03 -07:00
Krrish Dholakia 12bca649fc docs: refactor benchmarking docs to be clearer 2026-04-11 17:30:09 -07:00
Yuneng Jiang 909247785e Merge remote-tracking branch 'origin' into litellm_internal_staging_04_11_2026 2026-04-11 15:41:03 -07:00