litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-17 16:48:54 +00:00

Author	SHA1	Message	Date
ryan-crabbe-berri	770fff7058	test(proxy): stop running real-DB tests in GitHub Actions unit jobs (#29700 ) * test(proxy): stop running real-DB tests in GitHub Actions unit jobs GitHub Actions unit jobs were spinning up a Postgres service container, but the only active tests that touched it either used the DB incidentally (a cargo-culted prisma_client.connect()) or were genuine integration tests mislabeled as unit. Mock the incidental ones so the proxy-db job needs no container, and move the tests that genuinely need a database (proxy management behavior, master-key-not-persisted, schema-migration sync) to CircleCI, which is already the real-infrastructure lane. * test(proxy): restore no-unexpected-startup-writes canary in master-key test Greptile noted the hash-match assertion no longer catches other unexpected startup writes (a default key, a rotation artifact). The CircleCI job gives each run a fresh DB, so a clean startup must leave the table empty; add that canary back alongside the precise master-key assertion.	2026-06-04 14:56:02 -07:00
Sameer Kankute	c7ab9adde5	Litellm oss staging 030626 (#29578 ) * Fix incorrect agent API request example payload structure (#29556) * fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs (#29427) * fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs On /v1/messages and other LITELLM_METADATA_ROUTES, the parent OTel span is stored in litellm_params['litellm_metadata'] instead of litellm_params['metadata']. When the request body contains a native 'metadata' field (e.g. Anthropic's {"user_id": "..."}), litellm_params['metadata'] gets overwritten and the parent span is lost, producing orphan root spans with a different trace_id. Add fallback checks to litellm_metadata in: - _get_span_context(): so child spans find the correct parent - _end_proxy_span_from_kwargs(): so the proxy span gets closed Fixes: https://github.com/BerriAI/litellm/issues/27934 * test(otel): tighten assertions per Greptile review - test_span_context_metadata_takes_priority: assert litellm_metadata span is never accessed, proving metadata takes priority - test_span_context_no_parent_when_neither_has_span: assert both ctx and detected_span are None --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix: remove premature end-user budget check from get_end_user_object (#29420) * fix(proxy): remove premature end-user budget check from get_end_user_object Problem: - `_check_end_user_budget()` was called inside `get_end_user_object()` - This caused budget checks to run BEFORE `skip_budget_checks` could be evaluated - Zero-cost models (e.g., local vLLM) were incorrectly blocked when end-users exceeded their budget, even though they should bypass budget checks Solution: - Remove `_check_end_user_budget()` calls from `get_end_user_object()` - Budget enforcement now happens exclusively in `common_checks()` where `skip_budget_checks` context is available - `get_end_user_object()` keeps `route` as optional in function parameter for backwards compatibility and future implementation. * refactor(tests): update budget enforcement tests to reflect changes in get_end_user_object - test_get_end_user_object() verifies data fetching - test_check_end_user_budget() verifies enforcement - test_budget_enforcement_blocks_over_budget_users() integrates _check_end_user_budget() - test_resolve_end_user_reraises_budget_exceeded() is now test_resolve_end_user since no budget exceeded is thrown in get_end_user_object() * Gemini /images/generate and /images/edits billing fixes + add support for size and aspect ratio params (#29534) * Fix Gemini image config mapping * Address Gemini image config review * Format Gemini image generation transform * Fix Gemini image token usage logging * Share Gemini image request helpers * Fix Gemini Imagen model routing * Fixes as per self code review * Fixes per internal code review * Stop gating Imagen imageSize forwarding * Document Gemini image size mapping source * chore: retrigger lint * Clarify Gemini candidate count precedence * Add Inception provider (#29522) * add inception as provider (chat, fim) * linting * seperate test suite for chat and fim * fix test coverage * fix: model hub custom pricing model info (#29293) * Opik user auth key metadata extractors (#28397) * fix: enhance Opik metadata extraction to include user API key auth context fixed after refactoring to extractor logic * test: add unit tests for OPik metadata extraction logic * fix: enhance extract_opik_metadata function to prioritize metadata sources for improved accuracy * fix(ci): clarified comments and edited unit tests * test: add unit tests for OPik metadata extraction with auth and requester overrides * fix(ui): replace fixed favicon.ico with current api get /get_favicon (#29532) Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar> * fix(vertex/gemini): keep tool_call reference when a text-only assistant message follows (#29561) `_gemini_convert_messages_with_history` tracks `last_message_with_tool_calls` so a following tool result can be matched back to its tool call. The assignment was inside a branch guarded by `assistant_msg.get("tool_calls", []) is not None`, which is also True for a text-only assistant message (an empty list is not None). As a result, an assistant message with no tool calls that appears between a tool call and its tool result overwrote the reference, and conversion failed with: Exception: Missing corresponding tool call for tool response message. This shape is common: a model emits a short narration/assistant message after a tool call before the tool result is appended. Only update `last_message_with_tool_calls` when the assistant message actually carries tool_calls (or a function_call). Adds a regression test. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models (#28572) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models The 1-hour prompt-cache write tier (`cache_creation_input_token_cost_above_1hr`) was added to the us./global. variants of the Claude 4.5/4.6/4.7 family on Bedrock, but the eu./au./jp. cross-region inference profiles were left without it. AWS Bedrock pricing applies the same +10% regional premium across all geo profiles, so eu./au./jp. should carry the same 1-hour rates as us. (1.6x the 5-minute regional rate). Without these fields, cost tracking on EU/AU/JP Bedrock 1-hour-TTL prompt caching falls back to the 5-minute write rate and undercounts spend by ~60% for European, Australian, and Japanese tenants. Adds the 1-hour tier (and Sonnet 4.5's long-context >200K tier where AWS publishes one) to 14 regional Bedrock entries in both `model_prices_and_context_window.json` and the bundled `model_prices_and_context_window_backup.json`: - eu./au. Opus 4.6 ($11.00 / MTok) - eu./au. Opus 4.7 ($11.00 / MTok) - eu./au./jp. Sonnet 4.6 ($6.60 / MTok) - eu./au./jp. Sonnet 4.5 ($6.60 / MTok regular, $13.20 / MTok LC) - eu./au./jp. Haiku 4.5 ($2.20 / MTok) Also extends `tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py` with a `REGIONAL_EXPECTED` parametrized block covering all 13 new entries plus the existing 1.6x ratio invariant. Note: `eu.anthropic.claude-opus-4-5-20251101-v1:0` carries the wrong 5m rate today (base 6.25e-06 instead of regional 6.875e-06), which would break the 1.6x ratio check. It is intentionally left out of this PR so the scope stays "1-hour cache tier addition" — a separate follow-up should correct the EU 5m rates for Opus 4.5. --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * Add 1-hour cache write pricing tier for Vertex AI Anthropic models (#28569) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add 1-hour cache write pricing tier for Vertex AI Anthropic models GCP Vertex AI publishes a separate 1-hour cache write column for the Claude family (1.6x the 5-minute write rate, matching the documented Bedrock ratio). LiteLLM's Vertex AI Anthropic entries only carry the 5-minute tier, so any request that uses `cache_control: {"ttl": "1h"}` on Vertex AI Claude is undercounted in cost tracking by ~60%. The runtime side already supports the 1-hour tier — `VertexAIAnthropicConfig` extends `AnthropicConfig`, populating `ephemeral_1h_input_tokens`, and `_calculate_cache_creation_cost` reads `cache_creation_input_token_cost_above_1hr`. Only the price registry was missing data. Adds the field to 19 vertex_ai/claude-* entries across both `model_prices_and_context_window.json` and the bundled `model_prices_and_context_window_backup.json`: - Haiku 4.5 ($1.25 -> $2.00 / MTok) - Sonnet 3.7 / 4 / 4.5 / 4.6 ($3.75 -> $6.00 / MTok) - Opus 4.5 / 4.6 / 4.7 ($6.25 -> $10.00 / MTok) - Opus 4 / 4.1 ($18.75 -> $30.00 / MTok) Adds `tests/test_litellm/test_vertex_anthropic_1hr_cache_pricing.py` mirroring the Bedrock equivalent — pins each (5m, 1h) pair per model and asserts the 1.6x ratio across the family. Fixes #27781. --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * Fix Gemini multimodal function responses (#29325) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * address greptile review: add _transform_image_usage method and model-map supports_image_size flag - Add _transform_image_usage instance method to GoogleImageGenConfig that delegates to transform_gemini_image_usage, fixing the regression test - Replace hardcoded "2.5-flash" string check in supports_gemini_image_size with a get_model_info lookup on supports_image_size (default true) - Add supports_image_size: false to all gemini-2.5-flash model entries in model_prices_and_context_window.json so capability is controlled via the model map rather than embedded in code * fix test failures: schema validation, mypy type, model info plumbing, pricing test - Add supports_image_size to ModelInfoBase TypedDict so get_model_info surfaces it - Pass supports_image_size through _get_model_info_helper constructor call - Fix supports_gemini_image_size to use value is not False (None means unset, defaults to True) - Add supports_image_size to JSON schema in test_aaamodel_prices_and_context_window_json_is_valid - Correct gemini-3.1-flash-lite pricing assertions in test to match JSON values * Add Azure AI Kimi K2.6 metadata (#27052) * Add Azure AI Kimi K2.6 metadata * Scope Kimi metadata test cost map setup * fall back to substring check for models not in model_prices_and_context_window.json Models like gemini-2.5-flash-image-preview are not in the pricing JSON, so get_model_info raises. Fall back to "2.5-flash" not in model when the JSON has no explicit supports_image_size entry for the model. * fix(inception): don't forward global litellm.api_key to Inception FIM Match the Inception chat config: resolve only an Inception-specific key (param, litellm.inception_key, or INCEPTION_API_KEY) for the text-completion FIM path. The global litellm.api_key (often an OpenAI key) was both leaking to api.inceptionlabs.ai and taking precedence over the configured Inception key when set. * fix(auth): enforce end-user budget on custom-auth path that skips common_checks get_end_user_object() no longer raises BudgetExceededError, so custom-auth deployments with custom_auth_run_common_checks unset (which skip the centralized common_checks gate) stopped enforcing the end-user budget, letting an over-budget end user keep making requests. Re-enforce the budget in _run_post_custom_auth_checks on that path. --------- Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar> Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com> Co-authored-by: aneeshsangvikar <aneeshsangvikar@fiddler.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com> Co-authored-by: Suleiman Elkhoury <108065141+suleimanelkhoury@users.noreply.github.com> Co-authored-by: Dmitriy Alergant <93501479+DmitriyAlergant@users.noreply.github.com> Co-authored-by: Yanis Miraoui <yanis.miraoui19@imperial.ac.uk> Co-authored-by: Lovro Seder <vrovro@gmail.com> Co-authored-by: Thomas Mildner <12685945+Thomas-Mildner@users.noreply.github.com> Co-authored-by: José Luis Di Biase <josx@interorganic.com.ar> Co-authored-by: Lai Quang Huy <64073540+1qh@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: ZHONG Ziwen <67355585+zzw-math@users.noreply.github.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>	2026-06-03 11:01:51 -07:00
Sameer Kankute	38709ba9bb	feat(proxy): skip disable_background_health_check models on GET /health when flag set (#27716 ) * feat(proxy): skip disable_background_health_check models on GET /health when flag set Co-authored-by: Cursor <cursoragent@cursor.com> * fix comment * fix greptile comments * Fix health check fallback kwargs * Format health endpoint * Harden direct health check kwargs compatibility for monkeypatched perform_health_check Replace substring-based TypeError detection with unexpected-keyword checks and a short retry chain (full kwargs, instrumentation only, filter only, minimal) so partial stubs work regardless of which optional kwarg fails first. Add proxy unit tests for legacy three-arg stubs and single-kwarg variants. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix black --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>	2026-05-13 09:49:05 -07:00
Yuneng Jiang	650821b538	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_fix-config-update-targeted-upserts # Conflicts: # tests/test_litellm/proxy/test_proxy_server.py	2026-05-01 10:38:34 -07:00
user	842eea0131	chore(proxy): harden request control fields	2026-04-29 22:35:17 -07:00
Yuneng Jiang	abbe5d7f85	fix(proxy): /config/update writes only sent sections, drop store_model_in_db gate The endpoint loaded the full merged YAML+DB config and re-saved every top-level section to LiteLLM_Config rows via save_config(), so a UI toggle of one field persisted unrelated YAML state to DB as a side effect. It also rejected every request when store_model_in_db was False — including the request that would flip the flag to True (chicken-and-egg). Replace save_config with targeted per-section upserts: read the existing litellm_config row, merge in the request, upsert just that row. Sections the caller did not send are not touched. Drop the blanket store_model_in_db guard — the endpoint already requires prisma_client, and the startup-side override at proxy_server.py:6491 picks up general_settings.store_model_in_db=True from the DB on next restart.	2026-04-27 14:59:33 -07:00
Ishaan Jaffer	e8461b5b97	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
ishaan-berri	51876292a0	Litellm ishaan april4 2 (#25150 ) * feat(router): integrate allowed_fails_policy into health check failures (#24988) * feat(router): integrate allowed_fails_policy into health check failures Health check failures now increment the same per-deployment failure counters used by allowed_fails_policy, so users can control how many health check failures of each error type are required before a deployment enters cooldown. - ahealth_check() preserves the original exception in its return dict - run_with_timeout() returns a litellm.Timeout on health check timeout - _perform_health_check() propagates exceptions to unhealthy endpoints - _write_health_state_to_router_cache() calls _set_cooldown_deployments for each unhealthy endpoint that has an exception - When allowed_fails_policy is set, the binary health check filter is bypassed so cooldown is the sole routing exclusion mechanism - Safety net: if all deployments are in cooldown with enable_health_check_routing=True, the cooldown filter is bypassed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(router): add health_check_ignore_transient_errors flag When enabled, health check failures with 429 (rate limit) or 408 (timeout) status codes are skipped from the cooldown pipeline. These are transient load issues, not broken deployments. Auth errors (401), 404, and 5xx errors still increment counters and trigger cooldown as before. Config (general_settings): health_check_ignore_transient_errors: true Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(router): also exclude 429/408 from health state cache when ignore_transient_errors set The previous fix only skipped cooldown counter increments. The health state cache was still marking 429/408 endpoints as is_healthy=False, causing the binary health check filter to exclude them from routing. Now, when health_check_ignore_transient_errors=True, 429/408 endpoints are also excluded from the unhealthy list passed to build_deployment_health_states(), so the binary filter treats them as unaffected (not unhealthy). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(router): add health check driven routing guide New standalone page covering the full health check routing feature: allowed_fails_policy integration, health_check_ignore_transient_errors, architecture SVG, step-by-step setup, and gotchas (TTL, AllowedFails semantics). Replaces the inline section in health.md with a link to the new page. Added to the Routing & Load Balancing sidebar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(health-check-routing): fix three CI failures - Add "exception" to ILLEGAL_DISPLAY_PARAMS in health_check.py so the exception object is stripped before the health endpoint serializes results to JSON (fixes TypeError: 'URL' object is not iterable) - Add allowed_fails_policy = None to FakeRouter stubs in test_router_health_check_routing.py (fixes AttributeError) - Add health_check_ignore_transient_errors to config_settings.md router settings reference table (fixes documentation test) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix litellm/tests/proxy_unit_tests/test_proxy_server.py * fix(router): address greptile review comments - Narrow cooldown safety-net bypass: only fires when allowed_fails_policy is set (cooldown is health-check driven). Without a policy, cooldowns are from real request failures and must not be bypassed. - Restore cooldown deployments DEBUG log that was accidentally removed. - Fix test_health TypeError: move exception extraction to a separate exceptions_by_model_id dict returned alongside endpoints, so exception objects never appear in the endpoint dicts that get JSON-serialized by the /health response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(health-check-routing): properly isolate exceptions from health response Return exceptions_by_model_id as a separate third value from _perform_health_check / perform_health_check so exception objects (which contain non-JSON-serializable httpx URL types) never appear in the endpoint dicts that get serialized by the /health response. Callers updated: _health_endpoints.py, shared_health_check_manager.py, proxy_server.py background loop. All use the exceptions dict only for cooldown integration, not for display. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(shared-health-check): fix remaining 2-value return sites and update type annotation * fix(health-check-routing): fix P0 cooldown integration never firing The cooldown loop was reading endpoint.get("exception") which is always None because exceptions are now returned via exceptions_by_model_id, not stored in endpoint dicts. Fixed to use _exceptions.get(model_id). Also fixes the transient-error filter to use _exceptions instead of endpoint.get("exception"), and fixes all remaining 2-value return sites in shared_health_check_manager.py. Tests updated to pass exceptions via exceptions_by_model_id parameter instead of endpoint dicts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(health-check-routing): fix P1 transient-error filter broken on cache hits When SharedHealthCheckManager returns cached results, exceptions_by_model_id is always {} so the transient-error filter defaulted to status 500 for all endpoints, incorrectly marking 429/408 endpoints as unhealthy. Fix: store integer exception_status on each unhealthy endpoint dict in _perform_health_check. _get_endpoint_exception_status() uses the live exception object when available (direct path) and falls back to the stored integer (cache-hit path). The integer is JSON-serializable and survives the shared cache round-trip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(health-check-routing): gate cooldown loop behind allowed_fails_policy Without the policy, cooldown is not the routing exclusion mechanism. Firing _set_cooldown_deployments for all enable_health_check_routing users was a backwards-incompatible change — 401s would immediately cooldown deployments that the binary filter would have recovered on the next cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: undo allowed_fails_policy gate on cooldown loop Cooldown integration via health checks is intentional for all enable_health_check_routing users, not just those with allowed_fails_policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs+tests): fix health_check_ignore_transient_errors doc section and test coverage - Move health_check_ignore_transient_errors from router_settings to general_settings in config_settings.md (code reads it from general_settings) - Remove duplicate enable_health_check_routing / health_check_staleness_threshold entries that were incorrectly listed under router_settings - Replace TestHealthCheckEndpointExceptionPropagation tests with ones that exercise the real _perform_health_check code path via mocked ahealth_check, verifying exceptions appear in exceptions_by_model_id and NOT in endpoint dicts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tests+docs): fix tuple unpacking and docs test failures - Update test mocks that return (healthy, unhealthy) to return (healthy, unhealthy, {}) to match the new 3-value signature - Update test unpackings of perform_shared_health_check to use healthy, unhealthy, _ = ... - Add health_check_ignore_transient_errors to router_settings section in config_settings.md (it is a Router constructor param, so the doc test requires it there; it also lives in general_settings for proxy use) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix CodeQL errors * fix(tests): fix 2-value unpackings of _perform_health_check in test_health_check.py * fix(tests): fix mock _perform_health_check returning 2-tuple instead of 3 * fix team routing --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add distributed lock for key rotation job (#23364) * fix: add distributed lock for key rotation job * fix: address Greptile review feedback on key rotation lock (#23834) * fix: address Greptile review feedback on key rotation lock * fix req changes greptile * feat(proxy): Optional on_error for guardrail pipeline (API / technical failures) (#24831) * guardrails fallback * docs * docs: add LITELLM_KEY_ROTATION_LOCK_TTL_SECONDS to environment variables reference * fix(mypy): accept Union[Dict, Any] in _get_deployment_order and use typed list to fix min() type error * fix(mypy): use Optional[str] for api_base in PydanticAI provider to match superclass signature --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Shivam Rawat <shivam@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>	2026-04-04 23:09:42 +00:00
jayden	9ca1560501	chore: fix test	2026-03-30 19:14:01 -07:00
Krrish Dholakia	bc829d51f2	test: test	2026-03-28 19:17:38 -07:00
Ishaan Jaff	29e3fd5d79	[Release Fix] (#22411 ) * fix(lint): suppress PLR0915 for 3 complex methods that exceed 50-statement limit - streaming_iterator.py: _process_event (84 statements) - transformation.py: translate_messages_to_responses_input (51 statements) - transformation.py: transform_realtime_response (54 statements) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(mypy): resolve type errors in public_endpoints, user_api_key_auth, common_utils, transformation - public_endpoints.py: fix _cached_endpoints type annotation - user_api_key_auth.py: accept Optional[str] for end_user_id parameter - common_utils.py: add NewProjectRequest/UpdateProjectRequest to Union type - transformation.py: add ChatCompletionRedactedThinkingBlock and list[Any] to content type Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(proxy-extras): bump version to 0.4.50 and sync schema - Bump litellm-proxy-extras from 0.4.49 to 0.4.50 - Sync schema.prisma with main proxy schema - Includes new LiteLLM_ClaudeCodePluginTable model - Includes new @@index([startTime, request_id]) on SpendLogs - Update version references in requirements.txt and pyproject.toml Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(router): use string id in test_add_deployment and add defensive str() in register_model - Change test to use string '100' instead of int 100 for model_info.id - Add str() conversion in register_model to prevent AttributeError on non-string keys Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(security): update minimatch to 10.2.4 to fix CVE-2026-27903 and CVE-2026-27904 - Run npm audit fix in docs/my-website - Updates minimatch from 10.2.1 to 10.2.4 (fixes HIGH severity ReDoS vulnerabilities) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): update realtime guardrail test assertions to match actual guardrail behavior - test_text_message_blocked_by_guardrail_no_ai_response: allow guardrail's own block message text in response.done (previously expected empty content) - test_voice_transcript_blocked_by_guardrail: allow guardrail to send response.cancel + block message + response.create flow (previously expected no response.create) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: revert proxy-extras version in requirements.txt and pyproject.toml The litellm-proxy-extras 0.4.50 is not published to PyPI yet, so consumer references must stay at 0.4.49. Only the source package pyproject.toml should be bumped to 0.4.50 for the publish_proxy_extras CI job. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: make transcript delta check optional in voice guardrail test The guardrail sends an error event (guardrail_violation) when blocking voice transcripts; it does not always produce transcript deltas. Remove the assertion requiring response.audio_transcript.delta since the error event is the primary signal that blocked content was handled. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add missing env keys to documentation: LITELLM_MAX_STREAMING_DURATION_SECONDS and LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES These two environment variables were used in code but not documented in the environment variables reference section of config_settings.md, causing the test_env_keys.py CI test to fail. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix 13 mypy type errors across 6 files - in_flight_requests_middleware.py: Fix type: ignore error codes from [union-attr] to [attr-defined], add [arg-type] for Gauge *kwargs - transformation.py: Add [assignment] ignore for output_format reassignment, add fallback empty string for tool use id to fix arg-type - responses/main.py: Remove redundant type annotation on second secret_fields assignment to fix no-redef - streaming_iterator.py: Add [assignment] ignores for intermediate cache token assignments - handler.py: Add [typeddict-item] ignore for AnthropicMessagesRequest construction from dict - public_endpoints.py: Add [arg-type] ignore for _load_endpoints() return type mismatch with SupportedEndpoint model Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> fix: add auth overrides to spend tracking tests, fix realtime guardrail assertion, update UI minimatch - Add app.dependency_overrides for user_api_key_auth in 4 spend tracking tests that were returning 401 Unauthorized (error_code, error_message, error_code_and_key_alias, key_hash) - Fix realtime guardrail test to check ANY error event for guardrail_violation instead of just the first (OpenAI may send its own errors first) - Update ui/litellm-dashboard/package-lock.json to fix minimatch vulnerability Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix failing MCP e2e and create_mcp_server UI tests Test 1 (test_independent_clients_no_shared_session): - Add allow_all_keys: true to MCP servers in test config. With master_key and no DB, get_allowed_mcp_servers returned empty, causing 0 tools and 403 on tool calls. allow_all_keys bypasses per-key restrictions. - Add asyncio.sleep(0.5) between client connections to allow MCP SDK TaskGroup cleanup and avoid ExceptionGroup on connection close (MCP #915). Test 2 (create_mcp_server 'auth value is provided'): - Use userEvent.setup({ delay: null }) for instant keystrokes to avoid timeout from default typing delay on CI. - Increase per-test timeout to 15000ms for CI environments. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: stabilize proxy unit tests for parallel execution - test_response_polling_handler: add xdist_group to prevent heavy import OOM - test_db_schema_migration: use temp dir for worker isolation, sync schema.prisma index - test_custom_tokenizer_bug: use lighter tokenizer to prevent OOM in parallel Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: add auth overrides to more spend tracking and model info tests - Fix test_ui_view_spend_logs_pagination missing auth override (401) - Fix test_view_spend_tags missing auth override (401) - Fix test_view_spend_tags_no_database missing auth override (401) - Fix test_empty_model_list.py to use app.dependency_overrides instead of patch() for FastAPI dependency injection auth Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): use patch.object for aiohttp transport test to work in parallel execution The @patch decorator was not intercepting the static method call in parallel xdist workers. Using patch.object on the directly-imported class is more reliable. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(security): update minimatch from 10.2.1 to 10.2.4 in Dockerfile The Docker image was explicitly pinning minimatch@10.2.1 which has HIGH severity ReDoS vulnerabilities (GHSA-7r86-cg39-jmmj, GHSA-23c5-xmqv-rm74). Update to 10.2.4 which includes fixes for both CVEs. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ui): prevent MCP and TeamInfo test timeouts on CI - Add userEvent.setup({ delay: null }) to all tests using userEvent in both files - Add timeout: 15000 to tests with significant user interaction (typing, multiple clicks) - Fixes: create_mcp_server Bearer Token test, TeamInfo cancel button test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: stabilize parallel test execution and aiohttp transport test - test_aiohttp_handler: rewrite transport test to not rely on static method mock (consistently fails in parallel xdist workers) - test_proxy_cli: add xdist_group to prevent timeout during heavy imports - test_swagger_chat_completions: add xdist_group to prevent timeout Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(security): add serialize-javascript override to fix GHSA-5c6j-r48x-rmvq Add npm override for serialize-javascript>=7.0.3 in docs/my-website to fix HIGH severity RCE vulnerability via RegExp.flags. Also bump minimatch override to >=10.2.4. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix flaky tests: remove broken Vertex model, add retries for Anthropic - Remove vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas from test_partner_models_httpx_streaming - consistently returns 400 BadRequest - Add @pytest.mark.flaky(retries=6, delay=10) to test_function_call_parsing for transient Anthropic API overload errors - Add @pytest.mark.flaky(retries=6, delay=10) to test_openai_stream_options_call for transient Anthropic InternalServerError Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ci): add xdist_group(proxy_heavy) to prevent OOM in parallel proxy tests - Add pytestmark = pytest.mark.xdist_group('proxy_heavy') to test_proxy_utils.py - Change test_db_schema_migration.py from schema_migration to proxy_heavy group - Add @pytest.mark.xdist_group('proxy_heavy') to test_proxy_server.py::test_health Groups heavy proxy tests to run on same worker, avoiding worker OOM crashes. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix vertex AI qwen global endpoint test to mock vertexai module import The test_vertex_ai_qwen_global_endpoint_url test was failing because the VertexAIPartnerModels.completion() method tries to 'import vertexai' before any of the mocked code runs. In environments without google-cloud-aiplatform installed, this import fails with a VertexAIError(status_code=400). Fix by: - Adding patch.dict('sys.modules', {'vertexai': MagicMock()}) to mock the vertexai module import - Adding vertex_ai_location parameter to the acompletion call for completeness Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ci): add xdist_group to health endpoint and watsonx tests for parallel stability - test_health_liveliness_endpoint: add xdist_group('proxy_health') to prevent timeout - test_watsonx_gpt_oss tests: add xdist_group('watsonx_heavy') to prevent mock interference Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): pre-populate WatsonX IAM token cache to prevent parallel test interference The watsonx prompt transformation test was failing in parallel execution because litellm.module_level_client.post mock was being interfered with by other tests. Pre-populating the IAM token cache avoids the HTTP call entirely. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): add spend data polling with retries for e2e pass-through tests - test_vertex_with_spend.test.js: Replace 15s fixed wait with polling loop (up to 6 attempts, 10s apart) for spend data to appear in DB - Increase test timeout from 25s to 90s to accommodate polling - base_anthropic_messages_tool_search_test.py: Add flaky(retries=3) for streaming test that depends on live Anthropic API Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ci): reduce parallel workers from 8 to 4 for proxy tests to prevent OOM - litellm_proxy_unit_testing_part2: -n 8 -> -n 4 - litellm_mapped_tests_proxy_part2: -n 8 -> -n 4, timeout 60 -> 120 - Worker crashes consistently caused by too many parallel proxy tests each loading the full FastAPI app and heavy dependency tree Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(db): add migration for SpendLogs composite index (startTime, request_id) The @@index([startTime, request_id]) was added to schema.prisma but had no corresponding migration. This caused test_aaaasschema_migration_check to fail because prisma migrate diff detected the missing index. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(db): add migration for MCP available_on_public_internet default change to true The schema.prisma changed the default for available_on_public_internet from false to true, but no migration was created. This caused the schema migration test to detect drift. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): increase server wait time and add retry to flaky external API tests - test_basic_python_version.py: increase server startup wait from 60s to 90s for slower CI environments (fixes installing_litellm_on_python_3_13) - test_a2a_agent.py: add flaky(retries=3, delay=5) for non-streaming test that depends on live A2A agent endpoint Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): add flaky retries to all intermittent external API tests for 0-fail CI Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): add auth overrides to file endpoint tests that return 500 The test_target_storage tests were getting 500 because the FastAPI auth dependency wasn't overridden. Added app.dependency_overrides for proper auth bypass in test environment. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>	2026-02-28 09:46:35 -08:00
Sameer Kankute	886f9d6a70	Add support for forwarding provider's auth headers	2026-02-25 12:08:25 +05:30
Sean Marsh Glover	4652c73259	feat(proxy): limit concurrent health checks with health_check_concurrency (#20584 ) * staged first pass * black * Update litellm/proxy/health_check.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * simpler * restore cached logo * fix tests for perform_health_check max_concurrency arg * implement pr suggestion * and the helm chart * add configureable resources and probes to the deployment in the helm chart * more helm chart unittests * move some background healthcheck loggin to debug --------- Co-authored-by: Sean Glover <sglover@athenahealth.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-02-24 08:16:59 -08:00
Ishaan Jaff	daa682e125	fix(tests): add missing start_db_health_watchdog_task mock (#21804 ) * fix(tests): add missing start_db_health_watchdog_task mock in test_proxy_server_prisma_setup * fix(tests): add missing start_db_health_watchdog_task mock in test_health_check_not_called_when_disabled	2026-02-21 12:31:52 -08:00
Julio Quinteros Pro	1dc3f1e530	fix(tests): skip remaining real prisma DB tests in CI and related test suites Add @pytest.mark.skip to all test functions that use the real `prisma_client` fixture (requiring an external PostgreSQL connection) across 7 test files. Files updated: - tests/proxy_unit_tests/test_proxy_server.py (5 tests) - tests/proxy_admin_ui_tests/test_key_management.py (11 tests) - tests/proxy_admin_ui_tests/test_role_based_access.py (5 tests) - tests/proxy_admin_ui_tests/test_usage_endpoints.py (3 tests) - tests/local_testing/test_blocked_user_list.py (2 tests) - tests/local_testing/test_add_update_models.py (1 test) - tests/local_testing/test_update_spend.py (1 test) Total: 28 new skip markers added. Note: tests using mock_prisma_client (properly mocked) are unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 13:25:42 -03:00
Julio Quinteros Pro	3dfa3611d9	fix(tests): skip CI tests requiring external services (DB, API keys) Mark tests that require Prisma DB connections or external API credentials with @pytest.mark.skip / @pytest.mark.skipif so they don't block CI runs when the infrastructure is unavailable. Tests skipped: - test_create_user_default_budget (Prisma DB) - test_gemini_pass_through_endpoint (GEMINI_API_KEY / GOOGLE_API_KEY) - test_vertex_ai_gemini_token_counting_with_contents (Google API creds) - test_new/update/delete/info_project (Prisma DB) - test_create/list/get/delete_skill_sdk (Prisma DB) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 11:28:42 -03:00
Julio Quinteros Pro	bd8c1cc673	fix(tests): pass host to RedisCache in test_team_update_redis to avoid ValueError RedisCache() without arguments fails at construction with "ValueError: Either 'host' or 'url' must be specified for redis." The actual Redis connection is irrelevant since async_set_cache is mocked. Unlike test_get_team_redis which uses client_no_auth (which sets REDIS_HOST via fake_env_vars), test_team_update_redis has no fixture setting that env var. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 01:00:58 -03:00
yuneng-jiang	ef7261d0eb	Fixing tests	2026-01-26 20:07:30 -08:00
Yuta Saito	c5ced033c9	fix: anthropic during call guardrail error	2026-01-14 13:37:01 +09:00
Ishaan Jaffer	35c636ba97	test_health_check_not_called_when_disabled	2026-01-10 13:55:11 -08:00
Jorge Yero Salazar	48a3a741e5	Allow base_model for non Azure providers in proxy (#18038 ) * Allow base_model for non Azure providers in proxy * Add tests	2025-12-17 02:24:12 +04:00
yuneng-jiang	d3d005f9bf	fixing tests	2025-12-06 21:23:49 -08:00
yuneng-jiang	cc92fdf90f	Merge remote-tracking branch 'origin' into litellm_ui_callback_fix	2025-12-03 11:02:59 -08:00
Sameer Kankute	9edc50efbd	Fix 500 error for malformed request	2025-12-01 10:21:44 +05:30
yuneng-jiang	25e2331510	Merge remote-tracking branch 'origin' into litellm_ui_callback_fix	2025-11-27 17:29:29 -08:00
yuneng-jiang	22fd323d6b	Calling team/permissions_list and team/permissions_update now returns 404 with non-existent team (#16835 )	2025-11-22 14:21:58 -08:00
yuneng-jiang	cff8a3115a	Merge with main	2025-11-14 20:02:35 -08:00
Alexsander Hamir	c7847125c2	[Perf] Embeddings: Use router's O(1) lookup and shared sessions (#16344 ) * Refactor proxy embeddings to use shared processor - allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks - route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses - tighten token array decoding logic by using router deployment lookups and the unified error handler * Fix: Correctly process embedding requests with token arrays The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected. This was caused by a combination of three distinct issues: 1. In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup. 2. In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers. 3. In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists. Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions. * test: align proxy embedding assertions Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload. * Update proxy exception test The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list. * testing: unsure of this change I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it. * fix: remove unrelated change This change was not related to the embeddings refactor and actually belonged to a different branch.	2025-11-14 09:21:45 -08:00
yuneng-jiang	cb27d6c456	[Fix] UI - Delete Callbacks Failing (#16473 ) * Temp commit for branch switching * Created normalize callback name util function and tests	2025-11-12 18:43:37 -08:00
yuneng-jiang	7833b3fdb4	Addressing comments	2025-11-10 17:28:13 -08:00
yuneng-jiang	5853dbafc8	Merge branch 'main' into litellm_ui_callback_fix	2025-11-10 16:50:58 -08:00
Cesar Garcia	16325024df	fix: Use valid CallTypes enum value in embeddings endpoint (#16328 ) * Fix embeddings endpoint call_type to use valid CallTypes enum value Fixed bug where the `/embeddings` endpoint was passing `call_type="embeddings"` to guardrail hooks, but "embeddings" is not a valid value in the CallTypes enum. Changed to use `call_type="aembedding"` (async embedding) which is the correct CallTypes enum value and matches the route_type used in the same function. Added unit tests to verify: - "embeddings" is not a valid CallTypes enum value - "aembedding" is the correct valid value - The fix prevents ValueError when guardrails are enabled Fixes #16240 * Inline embeddings call type regression check * Ensure embedding test preserves proxy metadata	2025-11-06 19:25:00 -08:00
yuneng-jiang	2d5ae35a85	Show all callbacks on UI	2025-11-06 12:38:47 -08:00
yuneng-jiang	5d158775b1	[Fix] Litellm non root docker Model Hub Table fix (#16282 ) * Fix model hub table 404 on non-root docker * Adding test	2025-11-05 18:30:20 -08:00
Ishaan Jaffer	0bedf1c0a7	fix tests	2025-10-25 10:19:24 -07:00
Ishaan Jaffer	ce57f59531	test_gemini_pass_through_endpoint	2025-09-27 17:17:12 -07:00
Ishaan Jaffer	6aa35ec999	test text-embedding-ada-002	2025-09-27 12:41:35 -07:00
Ishaan Jaffer	c27beb74b9	test fix	2025-09-27 12:40:34 -07:00
Alexsander Hamir	eaa04cd8ce	fix: use fastuuid helper (#14903 ) * fix: use fastuuid helper across the codebase First batch of changes, simple drop in replacement. * second batch of changes * fixed: script mistake on helper file	2025-09-25 15:47:01 -07:00
Ishaan Jaff	79be436c2b	[Feat] Background Health Checks - Allow disabling background health checks for a specific (#13186 ) * disable background health checks for specific models * test_background_health_check_skip_disabled_models * Disable Background Health Checks For Specific Models	2025-07-31 13:48:35 -07:00
Krish Dholakia	014f4ef86b	Litellm fix proxy unit testing (#12778 ) * test: update tests * test: update test	2025-07-19 16:13:03 -07:00
Krish Dholakia	635367b020	Litellm dev 07 09 2025 p1 (#12462 ) * fix(db_spend_update_writer.py): fix db query * fix(litellm_pre_call_utils.py): support passing anthropic-beta headers when 'forward_client_headers_to_llm_api' is True allows user to pass along extra headers to vertex ai anthropic models * docs(config_settings.md): update docs	2025-07-09 21:46:15 -07:00
Krrish Dholakia	1e6d43e761	Squashed commit of the following: commit 440bc027251d8180174d762d83d271d0f7b68cc5 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 23:04:11 2025 -0700 fix: fix check commit 89a7451cb9ee26ff9f642335714dcc6f449d1fc2 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 22:42:30 2025 -0700 fix: fix test commit 1322e3b3497e5d334fdcaa18f0cf7a98ea758df4 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 20:52:40 2025 -0700 style: add more tooltips commit 172738b98b7864aabcacf3334a394098b300283f Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 20:51:09 2025 -0700 feat(team_member_view.tsx): add a tooltip commit 895eb28deb9127985e30b5e859e5bca8530951c9 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 18:46:49 2025 -0700 fix(teams.tsx): support setting team member budget on create commit 003cc54a6dd0f65030c4f39a8487adc771b62e11 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 18:40:49 2025 -0700 fix(team_member_view.tsx): style improvements commit a627a044f21df788f80d92a4081212072be91632 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 18:40:01 2025 -0700 fix(team_member_view.tsx): handle scientific notation in string commit c5a3b7bd8419f6394e1b490849555d02d473baed Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 18:34:25 2025 -0700 feat(team_membership_view.tsx): show team member spend + max budget on UI commit e986d12ad5b07c676f4cac5e16745939d7473dee Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 18:28:06 2025 -0700 feat(team_member_view.tsx): show team member spend + budget on team info commit 8e398607b25f8a8f0bab41964810b5dd27c5e3f2 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 18:18:16 2025 -0700 feat(team_info.tsx): show team member budget on team info commit 1f56886b5913dafefc0c00fbe741c0c9c01144a6 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 18:15:30 2025 -0700 feat(team_endpoints.py): get team budget table on team info allows user to see max budget set for team members commit 0a4320bbfa406c24ad32a420f82152da7bdd7323 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 18:10:06 2025 -0700 feat(team_endpoints.py): return team member budget on team info allows ui to display this to admin / team member commit 6a4e29f87b333ae9977e8f878960e63becd89150 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 17:57:20 2025 -0700 fix(team_endpoints.py): support updating team budget on UI commit 53f0fff34032977433dfe6935ce0a684a4141fd8 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 17:38:17 2025 -0700 feat(proxy/_types.py): return team member spend update pydantic object to include spend Allows showing spend of team member within team on UI commit ef2a1a43ecf7fecfb904042cbf47b3d56246edcb Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 16:31:42 2025 -0700 feat(team_endpoints.py): support 'team_member_budget' param on `/team/update` enables budget working across all team members commit 512999f1249b00a02a30f049a0cfa36e829ff989 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 16:20:04 2025 -0700 test: add unit tests for default team member budget commit 90fa3f61a2d63e12b9f3e1da9775f5c8b7294b5f Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 15:37:51 2025 -0700 feat(team_endpoints.py): support using default team member budget id, if set allows all team members to use the same budget id commit acef5324b1a0935a482c71060f610c3d8823e8c3 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 15:22:30 2025 -0700 feat(team_endpoints.py): support `team_member_budget` param on `/team/new` Allow creating 1 budget for all users within team (makes it easier to increase/reduce budget if needed for all team members) commit 2e867ac70fbd8768e7c27cf3b078e6dc10e566b9 Author: Krrish Dholakia <krrishdholakia@gmail.com> Date: Fri Jun 20 13:45:06 2025 -0700 fix(ui_sso.py): ensure user is added to team, if set via default internal settings allows users signed up via SSO to be added to default team	2025-06-20 23:11:53 -07:00
Krish Dholakia	39de3610be	fix(internal_user_endpoints.py): support user with `+` in email on us… (#11601 ) * fix(internal_user_endpoints.py): support user with `+` in email on user info ensures user is correctly parsed from input * fix(factory.py): support vertex function call args as None handles empty string in args for vertex gemini calls * docs(langfuse_integration.md): pin langfuse sdk version on docs * fix(vertex_ai/): return empty dict, instead of none when empty string given * refactor: reduce function size * fix: fix linting errors * fix: revert check * fix(internal_user_endpoints.py): fix check * test: update tests * test: update tests	2025-06-10 22:13:10 -07:00
Krish Dholakia	ba2d4d080f	feat(handle_jwt.py): map user to team when added via jwt auth (#11108 ) * feat(handle_jwt.py): map user to team when added via jwt auth makes it easy to ensure user belongs to team * test: test_openai_image_edit_litellm_sdk * use n 4 for mapped tests (#11109) * Fix/background health check (#10887) * fix: improve health check logic by deep copying model list on each iteration * test: add async test for background health check reflecting model list changes * fix: validate health check interval before executing background health check * fix: specify type for health check results dictionary * fix(user_api_key_auth.py): handle user custom auth set with no custom settings * bump: version 0.1.21 → 0.2.0 * ci(config.yml): run enterprise and litellm tests separately * fix: fix linting error * docs: add missing docs * [Feat] Add content policy violation error mapping for image editd (#11113) * feat: add image edit mapping for content policy violations * test fix * Expose `/list` and `/info` endpoints for Audit Log events (#11102) * feat(audit_logging_endpoints.py): expose list endpoint to show all audit logs make it easier for user to retrieve individual endpoints * feat(enterprise/): add audit logging endpoint * feat(audit_logging_endpoints.py): expose new GET `/audit/{id}` endpoint make it easier to retrieve view individual audit logs * feat(key_management_event_hooks.py): correctly show the key of the user who initiated the change * fix(key_management_event_hooks.py): add key rotations as an audit log event ' * test(test_audit_logging_endpoints.py): add simple unit testing for audit log endpoint * fix: testing fixes * fix: fix ruff check * [Feat] Use aiohttp transport by default - 97% lower median latency (#11097) * fix: add flag for disabling use_aiohttp_transport * feat: add _create_async_transport * feat: fixes for transport * add httpx-aiohttp * feat: fixes for transport * refactor: fixes for transport * build: fix deps * fixes: test fixes * fix: ensure aiohttp does not auto set content type * test: test fixes * feat: add LiteLLMAiohttpTransport * fix: fixes for responses API handling * test: fixes for responses API handling * test: fixes for responses API handling * feat: fixes for transport * fix: base embedding handler * test: test_async_http_handler_force_ipv4 * test: fix failing deepeval test * fix: add YARL for bedrock urls * fix: issues with transport * fix: comment out linting issues * test fix * test: XAI is unstable * test: fixes for using respx * test: XAI fixes * test: XAI fixes * test: infinity testing fixes * docs(config_settings.md): document param * test: test_openai_image_edit_litellm_sdk * test: remove deprecated test * bump respx==0.22.0 * test: test_xai_message_name_filtering * test: fix anthropic test after bumping httpx * use n 4 for mapped tests (#11109) * fix: use 1 session per event loop * test: test_client_session_helper * fix: linting error * fix: resolving GET requests on httpx 0.28.1 * test fixes proxy unit tests * fix: add ssl verify settings * fix: proxy unit tests * fix: refactor * tests: basic unit tests for aiohttp transports * tests: fixes xai --------- Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> * test: cleanup redundant test --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: JuHyun Bae <jhyun0408@nate.com>	2025-05-23 23:23:46 -07:00
JuHyun Bae	4d2048e208	Fix/background health check (#10887 ) * fix: improve health check logic by deep copying model list on each iteration * test: add async test for background health check reflecting model list changes * fix: validate health check interval before executing background health check * fix: specify type for health check results dictionary	2025-05-23 20:52:35 -07:00
Krish Dholakia	1ea046cc61	test: update tests to new deployment model (#10142 ) * test: update tests to new deployment model * test: update model name * test: skip cohere rbac issue test * test: update test - replace gpt-4o model	2025-04-18 14:22:12 -07:00
Krish Dholakia	51cb3c84e3	Litellm stable UI 02 17 2025 p1 (#8599 ) * fix(key_management_endpoints.py): initial commit with logic to get all keys for teams user is an admin for * fix(key_managements_endpoints.py): return all keys for teams user is an admin for * fix(key_management_endpoints.py): add query param to ensure user opts into seeing all team keys (not just their own) * fix(regenerate_key_modal.tsx): fix key regenerate * fix(proxy_server.py): fix model metrics check on none api base * test(test_key_generate_prisma.py): remove redundant test * test(test_proxy_utils.py): add unit test covering new management endpoint helper util * fix: fix test * test(test_proxy_server.py): fix test	2025-02-17 17:55:05 -08:00
Krish Dholakia	9e65f867ab	test: add more unit testing for team member endpoints (#8170 ) * test: add more unit testing for team member add * fix(team_endpoints.py): add validation check to prevent same user from being added to team again prevents duplicates * fix(team_endpoints.py): raise error if `/team/member_delete` called on member that's not in team prevent being able to call delete on same member multiple times * test: update initial tests * test: fix test * test: update test to handle no member duplication	2025-02-01 11:23:00 -08:00
Krish Dholakia	d9eb8f42ff	Litellm dev 01 27 2025 p3 (#8047 ) * docs(reliability.md): add doc on disabling fallbacks per request * feat(litellm_pre_call_utils.py): support reading request timeout from request headers - new `x-litellm-timeout` param Allows setting dynamic model timeouts from vercel's AI sdk * test(test_proxy_server.py): add simple unit test for reading request timeout * test(test_fallbacks.py): add e2e test to confirm timeout passed in request headers is correctly read * feat(main.py): support passing metadata to openai in preview Resolves https://github.com/BerriAI/litellm/issues/6022#issuecomment-2616119371 * fix(main.py): fix passing openai metadata * docs(request_headers.md): document new request headers * build: Merge branch 'main' into litellm_dev_01_27_2025_p3 * test: loosen test	2025-01-28 18:01:27 -08:00

1 2

61 Commits