* fix(hosted_vllm): normalize custom tools for chat completions
Convert custom tool definitions into OpenAI function tools before forwarding hosted_vllm chat requests to avoid provider-side validation failures. Add a regression test and include a local curl verification screenshot.
Made-with: Cursor
* Fix black issue
* Fix hosted vllm custom tool schema fallback
* fix black
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
- Streaming example referenced Llama-3.1 instead of Llama-3.3
- Add supports_vision: true for gemma-3-12b-it in both JSON files,
matching other providers (bedrock, novita)
The previous example set CRUSOE_API_BASE via env var and also passed
api_base= in the same call, making it look like both were required.
They are independent alternatives.
Trailing slashes on custom API base examples cause double-slash in
get_complete_url. Also fixes inconsistent list indentation in
test_crusoe_models_configuration.
- Remove trailing slash from docs Base URL to match providers.json
- Wrap model_cost mutations in try/finally to prevent test state leakage
- Add missing __init__.py to crusoe test package
The documentation source has moved to a separate repository,
BerriAI/litellm-docs, served at docs.litellm.ai. This PR removes
docs/my-website/ from this repo and updates README.md, AGENTS.md,
and CLAUDE.md to direct doc contributions to the new repo.
Also fixes a broken relative link in
litellm/integrations/levo/README.md.
The existing CI symlink in .github/workflows/test-code-quality.yml
(which clones litellm-docs and symlinks docs/my-website to it for
tests/documentation_tests/*) continues to work without change.
- Mark last_updated_at (AdaptiveRouterState) and last_activity_at
(AdaptiveRouterSession) with @updatedAt so Prisma refreshes the
timestamps on every write. Without this the fields stayed frozen at
INSERT time and the last_activity_at index was misleading for any
future TTL/eviction logic. Applied to all three schema.prisma copies;
no migration SQL change needed (Prisma @updatedAt is a client-side
annotation that doesn't touch DDL).
- get_state_snapshot: report cell.total_samples instead of alpha+beta
for the 'samples' field. The previous value inflated every cell by
the COLD_START_MASS prior (e.g. showed 10.0 before any real traffic
arrived), which confused operators reading /adaptive_router/.../state.
Updated docs + the snapshot test to match.
Also fixes two pre-existing merge-break syntax errors in router.py
(missing ')' on the AdaptiveRouter TYPE_CHECKING import; truncated
async_pre_routing_hook dispatch call for the adaptive router branch)
that were masking the rest of the file from the interpreter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add supported providers to prompt caching doc
* Move Z.ai / GLM to cache_control marker list
* Mark xAI models as supporting prompt caching
* Narrow xAI prompt caching flag to models with documented cache pricing
* Add prompt caching flag to grok-4, grok-4-0709, grok-4-latest
---------
Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local>
- Use 'auto_router/adaptive_router' prefix in example yaml, docs, and
README — the old 'adaptive_router/...' and 'openai/gpt-4o-mini' values
silently skipped adaptive-router init because detection requires the
'auto_router/adaptive_router' prefix.
- Read x-litellm-min-quality-tier from request headers (and the
'min_quality_tier' metadata key as fallback) in async_pre_routing_hook.
Previously the documented header was defined but never extracted, so
the quality-floor feature was inert.
- Evict expired entries from _session_states. The cache grew without
bound — added a parallel expiry map (same TTL as _owner_cache) and an
opportunistic bulk sweep when the cache crosses a size threshold.
- Align adaptive-router migration SQL with Prisma schema: all count
columns and the 'clean_credit_awarded' / 'last_processed_turn' fields
are NOT NULL in the data model, so the migration now declares them
NOT NULL. Fixes test_aaaasschema_migration_check.
Tests: 8 new covering header/metadata/precedence/invalid-value paths for
min_quality_tier and TTL-based eviction of _session_states.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: new agentic loop event hook
simplifies how to create logic for tool based multi llm calls
* fix: compress - make it work on anthropic input as well
* fix(compress.py): working prompt compression for claude code
ensures claude code messages can run through proxy easily
* docs: add agentic loop hook guide
* docs: add agentic_loop_hook to sidebar
* fix: fix multiple arguments error
* fix: fix tool call loop for compression on streaming /v1/messages
* fix: fix linting errors
* fix: fix ci/cd errors
* feat(litellm_pre_call_utils.py): use claude code session for litellm session id
allows claude code logs to be stitched together, making it easy to know they were all part of the same conversation
* fix: suppress incorrect mypy warning rE: module
* revert: drop PR's changes to litellm/proxy/_experimental/out/
Restores the 34 HTML files under _experimental/out/ to their pre-PR
paths (X/index.html -> X.html). All renames are R100 (content
unchanged); no other files are touched.
* fix: address greptile review comments on PR #25729
- Skip ``kwargs["tools"] = []`` injection when compression is a no-op —
Anthropic Messages rejects empty tool arrays on requests that did not
originally declare tools.
- Move agentic-loop safety guards (fingerprint cycle / max depth) out of
the per-callback try/except so they propagate instead of being swallowed
by the generic exception handler. Extracted _check_agentic_loop_safety.
- Gate generic ``x-<vendor>-session-id`` capture behind the
LITELLM_CAPTURE_VENDOR_SESSION_HEADERS env var (off by default) to
preserve backwards compatibility; explicit x-litellm-* headers are
unaffected.
- Fix monkeypatch target in pre-call-hook test to patch the actual
module-level binding
(litellm.integrations.compression_interception.handler.compress).
- Add regression tests for empty-tools skip and opt-in session capture.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* revert: drop LITELLM_CAPTURE_VENDOR_SESSION_HEADERS flag
Generic x-<vendor>-session-id header capture is a new feature and only
runs *after* the explicit x-litellm-trace-id / x-litellm-session-id
checks, so it does not change behavior for any existing caller that was
already using the LiteLLM headers — no backwards-incompatibility to gate.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(compress): replace input_type with CallTypes call_type
Drop the bespoke ``CompressionInputType`` literal and use the existing
``litellm.types.utils.CallTypes`` enum instead. ``litellm.compress()``
now takes ``call_type: Union[CallTypes, str]`` (default
``CallTypes.completion``) — no new concept to learn, and the enum is
already the way the rest of the codebase talks about request shapes.
Supported values: ``completion`` / ``acompletion`` (OpenAI chat-completions
shape) and ``anthropic_messages`` (Anthropic structured content blocks).
Updated: compress(), the compression_interception handler, tests, docs,
and the two eval scripts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add copy-page-as-markdown button + llms.txt generation
Adds the signalwire llms-txt Docusaurus plugin + theme so every
docs page gets:
- A "Copy Page" dropdown in the breadcrumbs (Copy, View Markdown,
Ask ChatGPT, Ask Claude) — defaults from the theme hook, no
extra config required.
- A raw `.md` companion at `<page>.md` for LLM consumption.
- Site-wide `/llms.txt` index and `/llms-full.txt` corpus.
The signalwire plugin README documents a `copyPageButton` option
that the v1.2.2 Joi schema actually rejects; the theme's defaults
cover the same feature set, so only `content.enableMarkdownFiles`
and `enableLlmsFullTxt` are set. Theme is pinned to `1.0.0-alpha.9`
because the floating version resolves to a broken canary whose
`main` points at a missing file.
Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
* docs: pin exact versions for signalwire llms-txt deps
Drop the caret ranges on the two packages added in the prior
commit so the docs site pulls byte-identical npm tarballs on
every install. Matches the existing convention in this
package.json (everything else is already exact) and protects
against supply-chain substitution if a malicious patch version
is published under the same minor.
Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
* docs: upgrade signalwire llms-txt plugin to v2 alpha + enable copy button
The stable v1.2.2 plugin we first pinned does not call setGlobalData
during contentLoaded, so the theme's CopyPageContent component always
returned null (its `!siteConfig` bailout). The theme v1.0.0-alpha.9
is built against the v2-alpha plugin API, which is the version that
actually wires the copy-content JSON and plugin config into the theme
via setGlobalData.
Pins plugin to 2.0.0-alpha.7 (exact, no caret) and switches the
config to the v2 schema:
- top-level `markdown` + `llmsTxt` replace the v1 `content` block
- new `ui.copyPageContent` (off by default in v2) enables the button
with view-markdown + ChatGPT + Claude actions.
Verified end-to-end: production build serves the dropdown with
"Copy Raw Markdown", "View Markdown", "Reference in ChatGPT", and
"Reference in Claude" on /docs/routing (button mounts at ~x=960 in
the breadcrumbs row).
Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
Apply reasoning-first precedence for background health-check max tokens, parse reasoning env as optional, and raise non-wildcard fallback max_tokens from 1 to 5 for better reliability.
Made-with: Cursor
- Set --ifm-pre-padding to 1.25rem for consistent code block inset
- Restore horizontal padding for line-numbered Docusaurus blocks
- Scope pre/code resets via article .markdown so blog chip styles
no longer strip CodeBlock inner padding on Prism fences
Made-with: Cursor
* Add capability to override default GitHub Copilot authentication endpoints
This feature adds support for GitHub Enterprise subsriptions with custom domain/data ownership (which use a different URL compared to standard accounts)
* Update documentation with new parameters
* Move access token URL and Client ID retrieval outside for loop
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Fix spurious comment from Greptile review
* Align api_base retrieval behavior across chat and embedding transformations
* Add missing GitHub Copilot client ID parameter in docs
* Update website documentation with newer options for GitHub Enterprise Copilot
* Fix default value for Copilot client ID in docs
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add announcement bar for Trivy compromise resolution notice
Add a Docusaurus announcement bar to the top of the docs site informing
users that the Trivy supply-chain compromise has been mitigated and
resolved. The banner:
- States all affected packages have been deleted and releases are safe
- Links to the Security Townhall blog post for details
- Links to the CI/CD v2 blog post for improvements made
- Uses a green background with closeable dismiss button
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
* Use :::note admonition instead of announcement bar
Replace the Docusaurus announcementBar with a :::note admonition on the
docs index page. The note appears below the hero image with the title
'Security Update' and links to the Security Townhall and CI/CD v2 blog
posts.
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
* Update security notice wording to 'contained'
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
* Move note above hero image and add to root page
- Move the security notice above the product screenshot on /docs
- Add the same notice to the root page (src/pages/index.md)
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
* Update security notice wording
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
- Release v1.82.3: note removal of injected default when reasoning_effort omitted
- Blog gemini_3: correct defaults and reasoning_effort mapping guidance
- Provider gemini.md: align tip and mapping table with implementation
Made-with: Cursor