The existing documentation for the Responses API bridge only showed
examples with models that have `mode: responses` (like o3-deep-research),
which work automatically. This update clarifies that models with
`mode: chat` (like gpt-4o, gpt-5) require the `openai/responses/` prefix
to use built-in tools like web_search_preview.
Changes:
- Explain the `mode` property from model_prices_and_context_window.json
- List models with mode: responses vs mode: chat
- Add example showing the common error and how to fix it
- Add SDK example using the prefix with gpt-4o
- Update proxy example with both automatic and prefix-based configs
- Fix invalid trailing comma in original JSON example
Add Claude Opus 4.6, Sonnet 4.6, Opus 4.5, Sonnet 4.5, and Haiku 4.5
to the web fetch supported models documentation. These models were
missing from the list despite supporting the web_fetch tool.
Add usage example with concrete model entry, explanation of load-time
expansion, and cross-reference to model_alias_map to clarify the
difference between the two features.
When a provider's finish_reason is mapped to a different OpenAI-compatible
value (e.g. "MALFORMED_FUNCTION_CALL" → "stop"), the original value is now
preserved in choices[].provider_specific_fields["native_finish_reason"].
This allows agent loops to distinguish between different stop conditions
without breaking the unified OpenAI-compatible finish_reason mapping.
Also returns a defensive copy from get_finish_reason_mapping() to prevent
accidental mutation of the global _FINISH_REASON_MAP.
* docs: add pip/venv upgrade workflow guide
- Add comprehensive guide for upgrading LiteLLM proxy via pip
- Covers Prisma client regeneration and DB migration steps
- Includes verification commands and troubleshooting tips
- Links to existing Prisma migration troubleshooting doc
* docs: clarify Python version in prisma generate command
- Update example to show multiple Python versions (3.11, 3.12, 3.13)
- Make it clear LiteLLM supports multiple Python versions, not just 3.11
* docs: emphasize venv activation before running commands
- Add info box at top reminding users to activate venv
- Include venv activation step before starting proxy (both options)
- Add Windows activation command for cross-platform clarity
- Make it clear all commands assume activated venv
* docs: add pip_venv_upgrade to sidebar navigation
- Add new page to Troubleshooting section in sidebars.js
- Positioned after Performance/Latency category and before rollback
- Makes the upgrade guide discoverable through docs navigation
* docs: show explicit --schema flag in prisma migrate deploy
- Add explicit --schema path to Option B migration command
- Remove ambiguous instruction about running from litellm_proxy_extras
- Include path variable guidance for clarity
- Makes the command immediately runnable without directory navigation
* Update docs/my-website/docs/troubleshoot/pip_venv_upgrade.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update docs/my-website/docs/troubleshoot/pip_venv_upgrade.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* fix: close code block and add missing section in pip_venv_upgrade.md
* docs: define schema-path placeholder in verification section
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Resolve conflict in perplexity/responses/transformation.py by keeping
the simplified ~50 line version (PR's goal) instead of main's ~410 line
version. Added supports_native_websocket() -> False from main.
Add usage example with concrete model entry, explanation of load-time
expansion, and cross-reference to model_alias_map to clarify the
difference between the two features.
* fix(mcp): add AWS SigV4 auth for Bedrock AgentCore MCP servers
Add aws_sigv4 auth type to MCP client via httpx.Auth subclass that
signs each request with SigV4 using botocore. Enables mcp_servers
config to connect to AgentCore-hosted MCP servers.
* docs(mcp): add AWS SigV4 auth documentation for Bedrock AgentCore
Add dedicated docs page for configuring MCP servers with AWS SigV4
authentication, update MCP overview with aws_sigv4 auth type and
config example, and link from Bedrock AgentCore provider docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(mcp): address Greptile review — requires_request_body, full header signing, health check
- Add requires_request_body = True to MCPSigV4Auth so httpx buffers the
request body before calling auth_flow (prevents empty body hash for
streaming requests)
- Pass all request headers to AWSRequest for canonical SigV4 signing
instead of only Content-Type
- Exclude aws_sigv4 from health check skip logic since it has its own
credential fields (not authentication_token)
- Fix docs: mark aws_access_key_id/aws_secret_access_key as optional
(falls back to boto3 credential chain)
- Add test for requires_request_body flag
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Add tip boxes explaining that gpt-5.4 does not support reasoning_effort
with function tools in /v1/chat/completions, and that the responses
bridge (openai/responses/gpt-5.4) should be used instead.
* feat(proxy): add Prisma DB pool and engine health metrics to Prometheus
Add a PrismaMetricsCollector that periodically queries pg_stat_activity
and the Prisma engine process to expose connection pool and engine health
as Prometheus gauges/counters. Auto-enabled when prometheus_system is in
service_callback.
New metrics:
- litellm_db_pool_active_connections (Gauge)
- litellm_db_pool_idle_connections (Gauge)
- litellm_db_pool_total_connections (Gauge)
- litellm_db_pool_waiting_connections (Gauge)
- litellm_db_engine_up (Gauge)
- litellm_db_engine_restarts_total (Counter)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address Greptile review feedback
- Only increment engine_restarts counter on heavy reconnects (engine
actually dead), not lightweight network-blip reconnects
- Fix potential KeyError in _get_or_create_gauge/counter fallback path
when REGISTRY._names_to_collectors is absent
- Rename litellm_db_pool_waiting_connections to
litellm_db_pool_lock_waiting_connections to clarify it measures lock
contention, not pool slot queuing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: warn when prometheus_system enabled but watchdog disabled
Log a warning when users have prometheus_system in service_callback
but PRISMA_HEALTH_WATCHDOG_ENABLED=false, since DB pool and engine
metrics won't be collected in that configuration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* ci: retrigger CI checks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: use labeled gauge for DB pool connection metrics
Replace 3 separate pool gauges (active, idle, total) with a single
`litellm_db_pool_connections` gauge using a `state` label. This is more
Prometheus-idiomatic and exposes all pg_stat_activity states (active,
idle, idle in transaction, etc.) without ambiguity about what "total"
includes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address Greptile review — stale labels and fallback re-registration
- Zero out known pg_stat_activity states that are absent from the current
query result, preventing stale gauge values from persisting.
- Simplify _get_or_create_gauge/counter by removing the fallback loop
that could re-register an already-registered metric (ValueError).
- Add test for stale label clearing across collection cycles.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: include "unknown" in _PG_STATES for stale label clearing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: collect immediately on start and consolidate into single query
- Move sleep to end of loop so metrics appear on /metrics immediately
after startup instead of after a 30s delay.
- Combine pool state and lock waiting queries into a single SQL query
using conditional aggregation, halving per-cycle DB overhead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: prevent tight spin loop on collection error
Move asyncio.sleep outside the try/except so it always executes even
when _collect_engine_health() or _collect_pool_metrics() raises.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add multiprocess_mode to _get_or_create_gauge initialization
- Include `multiprocess_mode` parameter to properly support multiprocessing in Gauge creation.
- Ensure consistent behavior for labeled and unlabeled Gauges.
* fix: handle invalid env var and document watchdog prerequisite
- Add try/except ValueError for PRISMA_METRICS_COLLECTION_INTERVAL_SECONDS
to prevent proxy startup crash on non-numeric values (e.g. "30s")
- Document that DB metrics require both prometheus_system callback and
PRISMA_HEALTH_WATCHDOG_ENABLED=true
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use defensive null coalescing for query_raw row values
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add invalid env var fallback test and fix mock signature
- Add test for non-numeric PRISMA_METRICS_COLLECTION_INTERVAL_SECONDS
- Add **kwargs to mock _patched_get_or_create_gauge for forward compat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>