Commit Graph

6193 Commits

Author SHA1 Message Date
Sameer Kankute fd7ff0f269 fix(hosted_vllm): normalize custom tools for chat completions (#25763)
* fix(hosted_vllm): normalize custom tools for chat completions

Convert custom tool definitions into OpenAI function tools before forwarding hosted_vllm chat requests to avoid provider-side validation failures. Add a regression test and include a local curl verification screenshot.

Made-with: Cursor

* Fix black issue

* Fix hosted vllm custom tool schema fallback

* fix black

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-05-05 17:27:02 -07:00
Emmanuel Acheampong f8ba2d750b fix(crusoe): fix streaming doc model typo and add supports_vision for Gemma 3
- Streaming example referenced Llama-3.1 instead of Llama-3.3
- Add supports_vision: true for gemma-3-12b-it in both JSON files,
  matching other providers (bedrock, novita)
2026-05-01 17:27:52 +05:30
Emmanuel Acheampong e08b8ef7b6 fix(crusoe): split Custom API Base docs into two independent examples
The previous example set CRUSOE_API_BASE via env var and also passed
api_base= in the same call, making it look like both were required.
They are independent alternatives.
2026-05-01 17:27:52 +05:30
Emmanuel Acheampong 6e1e6244cf fix(crusoe): remove trailing slashes from API base URLs and fix list indentation
Trailing slashes on custom API base examples cause double-slash in
get_complete_url. Also fixes inconsistent list indentation in
test_crusoe_models_configuration.
2026-05-01 17:27:52 +05:30
Emmanuel Acheampong 9039eb1898 fix(crusoe): fix docs trailing slash, test state pollution, missing __init__.py
- Remove trailing slash from docs Base URL to match providers.json
- Wrap model_cost mutations in try/finally to prevent test state leakage
- Add missing __init__.py to crusoe test package
2026-05-01 17:27:52 +05:30
Emmanuel Acheampong caa0db3843 adding crusoe to litellm 2026-05-01 17:27:34 +05:30
clyang 3f5e28fcdc Adding Cycraft XecGuard integration (#26011) 2026-04-27 08:58:38 +05:30
Yuneng Jiang c35f3a50ae docs: remove docs/my-website, point contributors to litellm-docs
The documentation source has moved to a separate repository,
BerriAI/litellm-docs, served at docs.litellm.ai. This PR removes
docs/my-website/ from this repo and updates README.md, AGENTS.md,
and CLAUDE.md to direct doc contributions to the new repo.

Also fixes a broken relative link in
litellm/integrations/levo/README.md.

The existing CI symlink in .github/workflows/test-code-quality.yml
(which clones litellm-docs and symlinks docs/my-website to it for
tests/documentation_tests/*) continues to work without change.
2026-04-24 14:17:46 -07:00
shin-berri ca443a957c Merge pull request #24374 from BerriAI/litellm_staging_03_22_2026
Litellm staging 03 22 2026
2026-04-24 12:38:47 -07:00
yuneng-jiang 9dd7e37530 Merge pull request #25359 from BerriAI/litellm_Sameerlite/openai-chat-to-responses
feat(openai): add route_all_chat_openai_to_responses global flag
2026-04-24 12:06:19 -07:00
Sameer Kankute a0c52cda6e docs(proxy): clarify x-litellm-model-group vs provider model id (#25497)
Made-with: Cursor
2026-04-24 16:59:03 +00:00
yuneng-jiang 8dda834cf9 Merge pull request #25842 from BerriAI/litellm_docs-gemini3-thinking-defaults
docs(gemini): Gemini 3 thinking_level defaults and release note
2026-04-24 09:45:24 -07:00
Yuneng Jiang 4d5c3476a4 Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_docs-gemini3-thinking-defaults 2026-04-24 09:40:04 -07:00
Yuneng Jiang b2afc70080 Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_docs-code-block-padding-parity 2026-04-24 09:39:06 -07:00
Sameer Kankute e1466be825 feat(pricing): gemini-embedding-2 GA cost map, blog, and test (#26391)
* feat(pricing): gemini-embedding-2 GA cost map, blog, and test

- Add model_prices entries for gemini-embedding-2 (Gemini + Vertex paths)
- Add docs blog gemini_embedding_2_ga with LiteLLM proxy curl examples
- Add test_gemini_embedding_2_ga_in_cost_map in test_utils

Made-with: Cursor

* Fix greptile reviews
2026-04-24 09:28:18 -07:00
Cesar Garcia 8bd58fb82d Merge branch 'litellm_internal_staging' into litellm_staging_03_22_2026 2026-04-24 13:12:19 -03:00
Sameer Kankute 1720903bda Merge pull request #25346 from BerriAI/litellm_Sameerlite/responses-bridge-optin
feat(responses): add use_chat_completions_api flag for openai/ models with custom api_base
2026-04-24 20:55:22 +05:30
Sameer Kankute d5449f5b1a Merge pull request #26300 from BerriAI/litellm_oss_staging_04_22_2026
Litellm oss staging 04 22 2026
2026-04-23 18:53:58 +05:30
Sameer Kankute e3440baa0c Merge pull request #25767 from vinhphamhuu-ct/main
feat: Expand VideoMetadata support to all Gemini Models.
2026-04-23 17:20:01 +05:30
Sameer Kankute 94288d76a9 Merge pull request #26303 from BerriAI/litellm_internal_staging
merge main
2026-04-23 08:30:54 +05:30
Sameer Kankute f3b80726a7 Merge pull request #26301 from BerriAI/litellm_internal_staging
merge main
2026-04-23 08:30:10 +05:30
Cesar Garcia 25c0aa8bfd Merge pull request #26283 from BerriAI/litellm_internal_staging
Sync litellm_staging_03_22_2026 with litellm_internal_staging
2026-04-22 19:55:27 -03:00
Krrish Dholakia ecd9a83e61 fix(adaptive_router): P2 review items — @updatedAt + snapshot samples
- Mark last_updated_at (AdaptiveRouterState) and last_activity_at
  (AdaptiveRouterSession) with @updatedAt so Prisma refreshes the
  timestamps on every write. Without this the fields stayed frozen at
  INSERT time and the last_activity_at index was misleading for any
  future TTL/eviction logic. Applied to all three schema.prisma copies;
  no migration SQL change needed (Prisma @updatedAt is a client-side
  annotation that doesn't touch DDL).

- get_state_snapshot: report cell.total_samples instead of alpha+beta
  for the 'samples' field. The previous value inflated every cell by
  the COLD_START_MASS prior (e.g. showed 10.0 before any real traffic
  arrived), which confused operators reading /adaptive_router/.../state.
  Updated docs + the snapshot test to match.

Also fixes two pre-existing merge-break syntax errors in router.py
(missing ')' on the AdaptiveRouter TYPE_CHECKING import; truncated
async_pre_routing_hook dispatch call for the adaptive router branch)
that were masking the rest of the file from the interpreter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:27:01 -07:00
Krrish Dholakia b6fc75b3ce Merge branch 'litellm_internal_staging' into litellm_adaptive_routing 2026-04-20 15:28:08 -07:00
Michael-RZ-Berri 4f823cedac Add supported providers to prompt caching doc (#26124)
* Add supported providers to prompt caching doc

* Move Z.ai / GLM to cache_control marker list

* Mark xAI models as supporting prompt caching

* Narrow xAI prompt caching flag to models with documented cache pricing

* Add prompt caching flag to grok-4, grok-4-0709, grok-4-latest

---------

Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local>
2026-04-20 15:25:21 -07:00
Krrish Dholakia fba736ca3c fix(adaptive_router): 3 P1 review defects
- Use 'auto_router/adaptive_router' prefix in example yaml, docs, and
  README — the old 'adaptive_router/...' and 'openai/gpt-4o-mini' values
  silently skipped adaptive-router init because detection requires the
  'auto_router/adaptive_router' prefix.

- Read x-litellm-min-quality-tier from request headers (and the
  'min_quality_tier' metadata key as fallback) in async_pre_routing_hook.
  Previously the documented header was defined but never extracted, so
  the quality-floor feature was inert.

- Evict expired entries from _session_states. The cache grew without
  bound — added a parallel expiry map (same TTL as _owner_cache) and an
  opportunistic bulk sweep when the cache crosses a size threshold.

- Align adaptive-router migration SQL with Prisma schema: all count
  columns and the 'clean_credit_awarded' / 'last_processed_turn' fields
  are NOT NULL in the data model, so the migration now declares them
  NOT NULL. Fixes test_aaaasschema_migration_check.

Tests: 8 new covering header/metadata/precedence/invalid-value paths for
min_quality_tier and TTL-based eviction of _session_states.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:22:18 -07:00
Krrish Dholakia 386f334fee Prompt Compression - add it to the proxy (#25729)
* refactor: new agentic loop event hook

simplifies how to create logic for tool based multi llm calls

* fix: compress - make it work on anthropic input as well

* fix(compress.py): working prompt compression for claude code

ensures claude code messages can run through proxy easily

* docs: add agentic loop hook guide

* docs: add agentic_loop_hook to sidebar

* fix: fix multiple arguments error

* fix: fix tool call loop for compression on streaming /v1/messages

* fix: fix linting errors

* fix: fix ci/cd errors

* feat(litellm_pre_call_utils.py): use claude code session for litellm session id

allows claude code logs to be stitched together, making it easy to know they were all part of the same conversation

* fix: suppress incorrect mypy warning rE: module

* revert: drop PR's changes to litellm/proxy/_experimental/out/

Restores the 34 HTML files under _experimental/out/ to their pre-PR
paths (X/index.html -> X.html). All renames are R100 (content
unchanged); no other files are touched.

* fix: address greptile review comments on PR #25729

- Skip ``kwargs["tools"] = []`` injection when compression is a no-op —
  Anthropic Messages rejects empty tool arrays on requests that did not
  originally declare tools.
- Move agentic-loop safety guards (fingerprint cycle / max depth) out of
  the per-callback try/except so they propagate instead of being swallowed
  by the generic exception handler. Extracted _check_agentic_loop_safety.
- Gate generic ``x-<vendor>-session-id`` capture behind the
  LITELLM_CAPTURE_VENDOR_SESSION_HEADERS env var (off by default) to
  preserve backwards compatibility; explicit x-litellm-* headers are
  unaffected.
- Fix monkeypatch target in pre-call-hook test to patch the actual
  module-level binding
  (litellm.integrations.compression_interception.handler.compress).
- Add regression tests for empty-tools skip and opt-in session capture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* revert: drop LITELLM_CAPTURE_VENDOR_SESSION_HEADERS flag

Generic x-<vendor>-session-id header capture is a new feature and only
runs *after* the explicit x-litellm-trace-id / x-litellm-session-id
checks, so it does not change behavior for any existing caller that was
already using the LiteLLM headers — no backwards-incompatibility to gate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(compress): replace input_type with CallTypes call_type

Drop the bespoke ``CompressionInputType`` literal and use the existing
``litellm.types.utils.CallTypes`` enum instead.  ``litellm.compress()``
now takes ``call_type: Union[CallTypes, str]`` (default
``CallTypes.completion``) — no new concept to learn, and the enum is
already the way the rest of the codebase talks about request shapes.

Supported values: ``completion`` / ``acompletion`` (OpenAI chat-completions
shape) and ``anthropic_messages`` (Anthropic structured content blocks).

Updated: compress(), the compression_interception handler, tests, docs,
and the two eval scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 15:08:00 -07:00
nhyy244 a19bff4ca6 Feature/add audio support for scaleway (#26110)
* feat(scaleway): add SCALEWAY to LlmProviders enum

* feat(scaleway): add audio transcription config and dispatch wiring

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(scaleway): add behavior tests for audio transcription config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(scaleway): advertise audio_transcriptions in endpoint-support JSON

* docs(scaleway): document audio transcription support

* fix(scaleway): address PR review — plain-text response_format + missing-key fail-fast

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(scaleway): cover new response paths, drop gettysburg.wav coupling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 14:49:41 -07:00
Sameer Kankute 57eae8d01c Merge branch 'litellm_internal_staging' into litellm_staging_03_22_2026 2026-04-20 19:56:00 +05:30
Krrish Dholakia 70caf5aec0 docs: update docs 2026-04-18 21:31:53 -07:00
Krrish Dholakia 924fa6a3bc feat: commit new adaptive routing 2026-04-18 21:29:39 -07:00
ishaan-berri d03c301c79 Merge pull request #25936 from BerriAI/litellm_health-check-reasoning-tokens
fix(proxy): prioritize reasoning health-check max token precedence
2026-04-18 11:35:04 -07:00
Yuneng Jiang e004876950 Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/wonderful-bouman
# Conflicts:
#	tests/test_litellm/proxy/ui_crud_endpoints/test_proxy_setting_endpoints.py
2026-04-17 21:32:09 -07:00
ishaan-berri 1c128a86b8 Merge pull request #25256 from BerriAI/litellm_ishaan_april6
Litellm ishaan april6
2026-04-17 16:26:45 -07:00
Yuneng Jiang 1e25a00e5d [Docs] BYOK tutorial: document the UI-only configuration path 2026-04-17 13:32:17 -07:00
Krrish Dholakia dd76cc5d9d docs: add "Copy Page as Markdown" + llms.txt to docs site (#25975)
* docs: add copy-page-as-markdown button + llms.txt generation

Adds the signalwire llms-txt Docusaurus plugin + theme so every
docs page gets:
- A "Copy Page" dropdown in the breadcrumbs (Copy, View Markdown,
  Ask ChatGPT, Ask Claude) — defaults from the theme hook, no
  extra config required.
- A raw `.md` companion at `<page>.md` for LLM consumption.
- Site-wide `/llms.txt` index and `/llms-full.txt` corpus.

The signalwire plugin README documents a `copyPageButton` option
that the v1.2.2 Joi schema actually rejects; the theme's defaults
cover the same feature set, so only `content.enableMarkdownFiles`
and `enableLlmsFullTxt` are set. Theme is pinned to `1.0.0-alpha.9`
because the floating version resolves to a broken canary whose
`main` points at a missing file.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* docs: pin exact versions for signalwire llms-txt deps

Drop the caret ranges on the two packages added in the prior
commit so the docs site pulls byte-identical npm tarballs on
every install. Matches the existing convention in this
package.json (everything else is already exact) and protects
against supply-chain substitution if a malicious patch version
is published under the same minor.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* docs: upgrade signalwire llms-txt plugin to v2 alpha + enable copy button

The stable v1.2.2 plugin we first pinned does not call setGlobalData
during contentLoaded, so the theme's CopyPageContent component always
returned null (its `!siteConfig` bailout). The theme v1.0.0-alpha.9
is built against the v2-alpha plugin API, which is the version that
actually wires the copy-content JSON and plugin config into the theme
via setGlobalData.

Pins plugin to 2.0.0-alpha.7 (exact, no caret) and switches the
config to the v2 schema:
- top-level `markdown` + `llmsTxt` replace the v1 `content` block
- new `ui.copyPageContent` (off by default in v2) enables the button
  with view-markdown + ChatGPT + Claude actions.

Verified end-to-end: production build serves the dropdown with
"Copy Raw Markdown", "View Markdown", "Reference in ChatGPT", and
"Reference in Claude" on /docs/routing (button mounts at ~x=960 in
the breadcrumbs row).

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
2026-04-17 13:03:12 -07:00
Ishaan Jaffer f31d4faa87 Merge origin/main into litellm_ishaan_april6 2026-04-17 12:36:51 -07:00
Sameer Kankute 27877b4b06 Merge pull request #25945 from BerriAI/litellm_internal_staging
merge litellm_internal_staging
2026-04-17 18:48:03 +05:30
Sameer Kankute 96882e04e7 Merge pull request #25942 from BerriAI/litellm_internal_staging
merge litellm_internal_staging
2026-04-17 18:18:12 +05:30
Sameer Kankute d86c6a5b2f fix(proxy): prioritize reasoning health check token defaults
Apply reasoning-first precedence for background health-check max tokens, parse reasoning env as optional, and raise non-wildcard fallback max_tokens from 1 to 5 for better reliability.

Made-with: Cursor
2026-04-17 12:36:58 +05:30
Sameer Kankute 52fde57df7 feat(docs): align fenced code padding on blog and doc pages
- Set --ifm-pre-padding to 1.25rem for consistent code block inset
- Restore horizontal padding for line-numbered Docusaurus blocks
- Scope pre/code resets via article .markdown so blog chip styles
  no longer strip CodeBlock inner padding on Prism fences

Made-with: Cursor
2026-04-17 10:04:03 +05:30
Stefano Romanò f69b9d6564 Add capability to override default GitHub Copilot authentication endp… (#25915)
* Add capability to override default GitHub Copilot authentication endpoints

This feature adds support for GitHub Enterprise subsriptions with custom domain/data ownership (which use a different URL compared to standard accounts)

* Update documentation with new parameters

* Move access token URL and Client ID retrieval outside for loop

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix spurious comment from Greptile review

* Align api_base retrieval behavior across chat and embedding transformations

* Add missing GitHub Copilot client ID parameter in docs

* Update website documentation with newer options for GitHub Enterprise Copilot

* Fix default value for Copilot client ID in docs

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-16 21:04:38 -07:00
Krrish Dholakia 13108f39cb Add docs announcement bar for Trivy compromise resolution (#25870)
* Add announcement bar for Trivy compromise resolution notice

Add a Docusaurus announcement bar to the top of the docs site informing
users that the Trivy supply-chain compromise has been mitigated and
resolved. The banner:
- States all affected packages have been deleted and releases are safe
- Links to the Security Townhall blog post for details
- Links to the CI/CD v2 blog post for improvements made
- Uses a green background with closeable dismiss button

Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>

* Use :::note admonition instead of announcement bar

Replace the Docusaurus announcementBar with a :::note admonition on the
docs index page. The note appears below the hero image with the title
'Security Update' and links to the Security Townhall and CI/CD v2 blog
posts.

Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>

* Update security notice wording to 'contained'

Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>

* Move note above hero image and add to root page

- Move the security notice above the product screenshot on /docs
- Add the same notice to the root page (src/pages/index.md)

Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>

* Update security notice wording

Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-04-16 15:15:52 -07:00
Sameer Kankute 13522ff33a Fix version in docs 2026-04-16 22:41:32 +05:30
ishaan-berri 44c992416c Merge pull request #25867 from BerriAI/litellm_day_0_opus_4.7_support
Litellm day 0 opus 4.7 support
2026-04-16 09:42:11 -07:00
Sameer Kankute 07d863b8e7 Remove max support for opus 4.7 2026-04-16 21:58:03 +05:30
Sameer Kankute f94c8dda82 Fix model names 2026-04-16 21:47:58 +05:30
Sameer Kankute b3d5ff5774 Fix tests + add docs 2026-04-16 21:45:31 +05:30
Sameer Kankute 4b5c86b8a1 Fix code qa 2026-04-16 19:29:08 +05:30
Sameer Kankute c98002ce74 docs(gemini): document Gemini 3 thinking_level API defaults
- Release v1.82.3: note removal of injected default when reasoning_effort omitted
- Blog gemini_3: correct defaults and reasoning_effort mapping guidance
- Provider gemini.md: align tip and mapping table with implementation

Made-with: Cursor
2026-04-16 11:24:35 +05:30