* docs: add copy-page-as-markdown button + llms.txt generation
Adds the signalwire llms-txt Docusaurus plugin + theme so every
docs page gets:
- A "Copy Page" dropdown in the breadcrumbs (Copy, View Markdown,
Ask ChatGPT, Ask Claude) — defaults from the theme hook, no
extra config required.
- A raw `.md` companion at `<page>.md` for LLM consumption.
- Site-wide `/llms.txt` index and `/llms-full.txt` corpus.
The signalwire plugin README documents a `copyPageButton` option
that the v1.2.2 Joi schema actually rejects; the theme's defaults
cover the same feature set, so only `content.enableMarkdownFiles`
and `enableLlmsFullTxt` are set. Theme is pinned to `1.0.0-alpha.9`
because the floating version resolves to a broken canary whose
`main` points at a missing file.
Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
* docs: pin exact versions for signalwire llms-txt deps
Drop the caret ranges on the two packages added in the prior
commit so the docs site pulls byte-identical npm tarballs on
every install. Matches the existing convention in this
package.json (everything else is already exact) and protects
against supply-chain substitution if a malicious patch version
is published under the same minor.
Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
* docs: upgrade signalwire llms-txt plugin to v2 alpha + enable copy button
The stable v1.2.2 plugin we first pinned does not call setGlobalData
during contentLoaded, so the theme's CopyPageContent component always
returned null (its `!siteConfig` bailout). The theme v1.0.0-alpha.9
is built against the v2-alpha plugin API, which is the version that
actually wires the copy-content JSON and plugin config into the theme
via setGlobalData.
Pins plugin to 2.0.0-alpha.7 (exact, no caret) and switches the
config to the v2 schema:
- top-level `markdown` + `llmsTxt` replace the v1 `content` block
- new `ui.copyPageContent` (off by default in v2) enables the button
with view-markdown + ChatGPT + Claude actions.
Verified end-to-end: production build serves the dropdown with
"Copy Raw Markdown", "View Markdown", "Reference in ChatGPT", and
"Reference in Claude" on /docs/routing (button mounts at ~x=960 in
the breadcrumbs row).
Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
Apply reasoning-first precedence for background health-check max tokens, parse reasoning env as optional, and raise non-wildcard fallback max_tokens from 1 to 5 for better reliability.
Made-with: Cursor
* Add capability to override default GitHub Copilot authentication endpoints
This feature adds support for GitHub Enterprise subsriptions with custom domain/data ownership (which use a different URL compared to standard accounts)
* Update documentation with new parameters
* Move access token URL and Client ID retrieval outside for loop
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Fix spurious comment from Greptile review
* Align api_base retrieval behavior across chat and embedding transformations
* Add missing GitHub Copilot client ID parameter in docs
* Update website documentation with newer options for GitHub Enterprise Copilot
* Fix default value for Copilot client ID in docs
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add announcement bar for Trivy compromise resolution notice
Add a Docusaurus announcement bar to the top of the docs site informing
users that the Trivy supply-chain compromise has been mitigated and
resolved. The banner:
- States all affected packages have been deleted and releases are safe
- Links to the Security Townhall blog post for details
- Links to the CI/CD v2 blog post for improvements made
- Uses a green background with closeable dismiss button
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
* Use :::note admonition instead of announcement bar
Replace the Docusaurus announcementBar with a :::note admonition on the
docs index page. The note appears below the hero image with the title
'Security Update' and links to the Security Townhall and CI/CD v2 blog
posts.
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
* Update security notice wording to 'contained'
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
* Move note above hero image and add to root page
- Move the security notice above the product screenshot on /docs
- Add the same notice to the root page (src/pages/index.md)
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
* Update security notice wording
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
The projected-spend alert in _update_key_cache read from
existing_spend_obj.litellm_budget_table["soft_budget"], but the nested
dict is never populated for virtual keys (the combined_view SQL maps
budget fields to flat top-level attributes instead). This made the
check dead code — it silently short-circuited on every request, and
when unblocked, crashed update_cache with a Pydantic ValidationError
because _get_projected_spend_over_limit returns a date object but
CallInfo.projected_exceeded_date expects str.
Fixes: read from the flat existing_spend_obj.soft_budget field that IS
populated, and stringify projected_exceeded_date.
Also marks team soft budget email alerts as enterprise in docs.
Closes#20324
The previous v1.83.3 changelog was generated against v1.83.0-nightly and
missed ~3 weeks of work. This regenerates it against the previous stable
release and restructures the LLM API Endpoints section to group by API
type (Responses, Batch, Count Tokens, Video Generation, Pass-Through,
etc.) matching the convention used in v1.82.3, v1.82.0, and v1.81.14.
Adds ~25 previously uncited PRs, cross-section duplications for
cross-cutting changes, and a verified first-time-contributors list.
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution)
- Add @kedarthakkar to new contributors (first-ever PR via #23769)
- Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
* feat: add litellm.compress() for BM25-based context compression
Adds a compress() utility that reduces context size for LLM calls using
BM25 relevance scoring (with optional semantic embeddings via
litellm.embedding()). Messages below a token threshold pass through
unchanged; messages above are scored, ranked, and the lowest-relevance
ones replaced with stubs. Originals are cached and a retrieval tool is
injected so the model can recover dropped content on demand.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(compress): truncate high-scoring messages instead of fully stubbing them
When a relevant message was too large to fit in the token budget it was
replaced with a stub, leaving the LLM with no real content to work with.
Now the highest-scoring overflow message is truncated (first 70% + last 30%
of words) to fill the remaining budget, so the LLM always receives actual
content rather than just a retrieval pointer.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(bm25): add prefix expansion so query terms match inflected doc tokens
"cook" now matches "cooking", "auth" matches "authentication", etc.
Without this, short query terms scored 0 against longer inflected forms
in documents, causing the wrong message to be kept.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add routing correctness test and eval harness for litellm.compress()
- test_simple_compression: parametrized test verifying BM25 routes the
right message based on query ("How to cook?" keeps cooking, "Fix auth"
keeps auth content)
- eval_compression.py: end-to-end eval harness comparing baseline vs
compressed model performance on HumanEval-style coding problems
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(eval): add SWE-bench Lite compression eval harness
Uses princeton-nlp/SWE-bench_Lite_bm25_27K which bundles ~27k tokens of
BM25-retrieved repo context per problem — large enough to meaningfully
stress litellm.compress() without Docker or GitHub API calls.
Proxy eval metrics (no test runner needed):
- has_diff: model produced a valid unified diff
- file_overlap: fraction of gold-patch files in generated patch
- exact_file_match: generated patch touches exactly the right files
Run: python tests/eval_swe_bench.py --model gpt-4o --problems 10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(eval): robust dataset loading + sys.path fix for worktree imports
- Add HuggingFace API fallback so the SWE-bench loader doesn't need
the `datasets` library (avoids pyarrow/numpy binary compat issues)
- Insert repo root into sys.path so compression module resolves
from worktrees
- Use direct import of litellm_compress to avoid __getattr__ issues
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* improve compression quality: line-based truncation, multi-message budget, 70% default target
- Switch truncate_message from word-based to line-based splitting to
preserve code structure (function boundaries, indentation)
- Allow multiple messages to be truncated instead of burning entire
budget on one overflow message
- Raise default compression target from 50% to 70% of trigger for
better quality/cost tradeoff
- Add --compression-target CLI arg to SWE-bench eval harness
- Move tests to canonical locations (tests/test_litellm/, scripts/)
- Add docs page and sidebar entries for compress()
Eval results (5 problems, Opus, trigger=10k):
Hunk overlap delta improved from -0.417 to -0.221
Content similarity now matches baseline (+0.006)
Cost savings: 72%
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add SWE-bench performance results to compress() docs
Include benchmark table from Opus eval (5 problems, trigger=10k)
showing 72% cost savings with file-level quality fully preserved.
Add metric explanations and eval runner examples.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(eval): use tolerance-based hunk overlap metric
The exact line-number matching was too brittle — LLM-generated patches
often target the right code region but with slightly offset line numbers.
Switch to hunk-level overlap with a 10-line tolerance window so nearby
edits count as matches. This better reflects actual patch quality.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add compression_interception callback for LiteLLM Proxy
Add a proxy callback that automatically compresses incoming /v1/messages
payloads above a configurable token threshold, runs the retrieval tool
loop server-side, and returns the final response. This brings compress()
support to proxy deployments (e.g. Claude Code via /v1/messages).
- New callback: litellm/integrations/compression_interception/
- Proxy config: compression_interception_params in litellm_settings
- Support for input_type param in compress() (openai vs anthropic)
- Docs: proxy setup instructions with YAML config example
- Tests: 139-line unit test suite for the interception handler
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Revert "feat: add compression_interception callback for LiteLLM Proxy"
This reverts commit 72bd5cb152ca1df07f14a14e14a2816e188874a8.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The blog CSS selectors for dark mode used descendant selectors like
[data-theme='dark'] .blog-wrapper which never matched because both
data-theme and .blog-wrapper are applied to the same <html> element
by Docusaurus. Fixed by using compound selectors (no space):
[data-theme='dark'].blog-wrapper.
Also added missing dark-mode overrides for:
- pre/code blocks in blog posts
- link colors in blog posts
- marquee items, separators, and labels on blog list page
- pagination links on blog list page
- meta text and author separators on blog list page
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
- New troubleshoot page and blog post with step-by-step comparison workflow
- Screenshots under static/img/cost-discrepancy-debug
- Link from spend tracking; sidebar entry under Troubleshooting
- Flowchart SVG: Path B connectors below box; clarify LiteLLM schedules customer calls when stuck
Made-with: Cursor
- Add LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS to the environment variables
reference so the documentation test passes.
- Annotate the values variable in _reject_os_environ_references so it
accepts both dict.values() and list iterables.
- Log a warning when dropping callback params that carry os.environ/
references so operators notice the misconfiguration.
- Require absolute paths in oidc/file/ and correct the documented
example to use the leading-slash form.
- Drop the unused return value from _reject_os_environ_references.
- Reject os.environ/ references supplied via /health/test_connection
request params instead of resolving them; config-sourced values are
already resolved before reaching the endpoint.
- Skip os.environ/ references in dynamic callback params loaded from
per-request metadata.
- Constrain oidc/file/ to an allowed credential directory allowlist
(defaults to /var/run/secrets and /run/secrets, overridable via
LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS).