litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-17 22:48:35 +00:00

Author	SHA1	Message	Date
Sameer Kankute	b84f7f82f7	Litellm oss staging (#29492 ) * fix(llm_http_handler): forward kwargs['model_info'] to litellm_params for /v1/messages Router._update_kwargs_with_deployment stamps the selected deployment's model_info on kwargs['model_info'] before dispatching the request. Downstream cooldown / success callbacks (deployment_callback_on_failure, deployment_callback_on_success) look up the deployment id via kwargs['litellm_params']['model_info']['id']. async_anthropic_messages_handler constructs its own litellm_params dict when calling logging_obj.update_from_kwargs and never forwarded model_info. As a result, /v1/messages requests dispatched through the Router had an empty model_info on litellm_params, the deployment id was not discoverable, and cooldown / success tracking were silently skipped for this call type. Forward kwargs['model_info'] into the litellm_params dict so the existing Router callbacks can identify the deployment. * merge main (#29486) * [Refactor] UI - Spend Logs: consolidate filter state and extract components (#25847) * [Refactor] UI - Spend Logs: consolidate filter state, extract components, remove dead code - Lift filter state into index.tsx and pass to hook (removes selectedX vars + sync useEffect) - Move main useQuery into useLogFilterLogic hook (removes isMainQueryEnabled toggle) - Delete dead RequestViewer component (300 lines, replaced by LogDetailsDrawer) - Extract LogsTableToolbar component (search, date range, pagination, live tail) - Extract filter options config to filter_options.ts - Remove dead code: handleRefresh, handleSelectLog, handleCloseDrawer, formatTimeUnit, showFilters/showColumnDropdown state, dropdownRef/filtersRef * Fix PR feedback: use antd Switch instead of Tremor in new file, fix typo * Collapse dual-path filtering into single React Query All 10 filter keys now go through the useQuery — the imperative performSearch / debouncedSearch / backendFilteredLogs path is deleted. Filter values are debounced via useDebouncedValue(300ms) before hitting the query key so text inputs don't fire per-keystroke. Removed: performSearch, debouncedSearch, backendFilteredLogs, lastSearchTimestamp, hasBackendFilters, clientDerivedFilteredLogs, the sort/page/time refetch useEffect, and the filteredLogs chooser memo. * Clean up remaining smells: remove isFetchingDeferred, internalize selectedTimeInterval, fix circular import - Remove useDeferredValue/isButtonLoading — pass logsQuery.isFetching directly - Move selectedTimeInterval into LogsTableToolbar as internal state - Move PaginatedResponse type from index.tsx to log_filter_logic.tsx * Fix quick-select dropdown overlapping sidebar * Fix stale quick-select label after Reset Filters Move selectedTimeInterval back to parent so handleFilterReset can reset it to the 24-hour default. The toolbar receives it as a prop. * refactor useLogFilterLogic tests for controlled-hook + backend-query shape The hook no longer owns filter state or does client-side filtering — it receives filters/setFilters as props and drives filteredLogs from a useQuery over uiSpendLogsCall. Reshape the tests around that contract: introduce a controlled harness that owns filter state, collapse the 10 per-filter assertions into a single it.each over filterKey → API param, and drop the client-side passthrough tests (the .min test file and the "return all logs when no filters" / "empty when logs null" cases) that no longer correspond to any hook behavior. * cover new useLogFilterLogic invariants: activeTab gate, filterByCurrentUser fallback, debounce negative, partial merge Follow-up to the test refactor. Adds coverage for invariants the refactored hook contract introduced but that the first pass didn't assert: - query enablement: expand the single accessToken-null case into an it.each over all four credential props (accessToken, token, userRole, userID), plus a separate test for activeTab !== "request logs" - filterByCurrentUser: when true with a blank User ID filter, the outbound request carries user_id = userID - debounce: also assert the negative case — no call in the first 100ms after a filter change (first waiting out the initial mount fire) - handleFilterChange: partial updates merge without clobbering other filter keys (protects the spread + default-fill semantics) - handleFilterReset: calls setCurrentPage(1) alongside restoring filters * fix typo dropping the live-tail banner border Tailwind silently ignores unknown classes, so border-greem-200 was leaving the auto-refresh banner with only its bg-green-50 fill and no outline. * memoize columns and derived table data in SpendLogsTable The table's columns array, four-pass data pipeline, and sort-change handler were all being rebuilt on every parent render. That made every filter click re-instance all 23 TanStack-Table columns, re-run filter/reduce/map over all rows, and recreate per-row click closures — all before the intentional 300ms debounce timer even got a chance to fire. Local measurement (40 rows, dev mode): filter click → query fires: 1957ms → 1217ms (−38%) Wrap createColumns in useMemo keyed on sortBy/sortOrder, hoist onSortChange into a useCallback, and move the searchedLogs / sessionComposition / sessionRepresentativeMap / filteredData derivations into a single useMemo keyed on filteredLogs.data + searchTerm. These were pre-existing issues on main — not regressions from the hook refactor — but the refactor made them user-visible because the new query debounce put render cost on the critical path. * apply dropdown filters instantly, debounce only text inputs Dropdown selects now bypass the 300ms debounce so a click updates the table immediately. Text inputs (Key Hash, Error Message, Request ID, User ID) still debounce. handleFilterReset also clears the pending debounced value so a half-typed text filter can't re-fire after reset. * fix(ui/spend-logs): restore lost loading/debounce behavior + cover dropped tests Regressions from the spend-logs-view refactor: - debounce the 'Public model / search tool' text filter (was firing a backend query per keystroke) via TEXT_FILTER_KEYS - restore Fetch-button smoothing through table repaint using useDeferredValue on the rendered data (explicit staleness) - show AntDLoadingSpinner during the auth-resolve phase instead of a blank screen on first load - only live-tail-poll while the tab is visible (refetchIntervalInBackground: false) - extract getLiveTailRefetchInterval helper for the poll decision Tests: - LogDetailContent: retries display (>0 / 0 / absent), overhead-absent - log_filter_logic: regression guard that the public-model filter debounces; getLiveTailRefetchInterval unit tests - logs_utils: getTimeRangeDisplay quick-select window labels * test(ui/spend-logs): cover the cold-load auth-not-ready spinner guard Asserts SpendLogsTable shows a loading spinner (not a blank screen) while credentials are unresolved, and renders the table once present. * fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 (#28281) * fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so the live audio calls in test_stream_chunk_builder_openai_audio_output_usage and test_standard_logging_payload_audio now hard-fail with a model-not-found error on every PR. The error was not "openai-internal", so the except block swallowed it and execution fell through to an unbound completion/response (UnboundLocalError). Switch both tests to gpt-audio-1.5, OpenAI's recommended successor (GA, not deprecated, already present in the litellm cost map so the response_cost assertion still resolves). Also broaden the except to skip with the real error in the reason instead of crashing, so a transient upstream blip can't reintroduce the UnboundLocalError. * fix(tests): narrow audio-test skip to model-not-found, re-raise the rest Address review feedback: an unconditional skip on any exception would silently mask a litellm-internal regression in the audio path (broken param transformation, serialization, bad header) instead of failing CI. Skip only on the upstream-unavailable class (model_not_found / "does not exist" / openai-internal) and re-raise everything else, so genuine regressions still fail loudly. The UnboundLocalError is still fixed because the handler either skips or raises - it never falls through. * fix(tests): add budget_exceeded to expected Interaction status enum Staging added budget_exceeded to the Interaction OpenAPI status enum; the staging merge into this branch picked up the spec change but not the matching test update, so test_status_enum_values failed in CI. Align the test's expected list (exact-match by design) with the live spec. * fix(tests): mock HTTP fetch in test_img_url_token_counter The test parameterized a live third-party image URL (blog.purpureus.net) which now 404s, causing get_image_dimensions to fall through to its base64 decode path and crash with 'not enough values to unpack' on every PR run. Mock safe_get with a tiny 1x1 PNG so the URL branch is still exercised without any network dependency. * fix(tests): swap gpt-4o-audio-preview to gpt-audio-1.5 in test_gpt4o_audio OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so both live tests in test_gpt4o_audio.py (test_audio_output_from_model and test_audio_input_to_model) hard-fail model_not_found on every PR. Swap the hardcoded model to OpenAI's successor gpt-audio-1.5 (same chat-completions audio surface; already in the litellm cost map). Mirror the narrowed-skip pattern from the prior audio fixes: skip on model_not_found / does-not-exist / openai-internal, re-raise everything else so genuine litellm regressions still fail CI loudly. * chore(ci): bump versions (#28287) * bump: version 0.4.72 → 0.4.73 * bump: version 1.86.0 → 1.87.0 * uv lock * feat: propagate team_id and team_alias to all child OTEL spans (#28273) - Add `_set_team_attributes_on_span` helper to stamp team_id/team_alias onto any span, ensuring these attributes are not limited to the root litellm_request span - Add `_set_team_attributes_from_kwargs` helper to extract team metadata from the standard_logging_object in kwargs and apply them to a span - Apply team attributes to raw request spans via `_maybe_log_raw_request` so downstream consumers can filter traces by team without needing the root span - Apply team attributes to guardrail spans so guardrail activity can be correlated to teams in tracing backends - Apply team attributes to exception logging spans to preserve team context during failure paths - Add comprehensive unit tests covering all new helpers, including edge cases where metadata or standard_logging_object is absent Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> * Day 0 support : Gemini 3.5 Flash (#28268) * Add day 0 support for gemini 3.5 flash * Fix pricing * Fix greptile review * Fix failing test * Fix tests * Fix: revert tool removing logic * fix greptile and test --------- Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * Gemini managed agents support (#28270) * Add support for environment variable in interactions api * Add sdk support for gemini create agent * Add agents endpoint support via proxy * Add outputs of each api * Add routing for model and agents param * Remove redundant condition in get_provider_agents_api_config LlmProviders.GEMINI.value is literally the string "gemini", so the second clause of the or was checking the exact same thing as the first. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix: forward query-param credentials to list/get/delete/versions Gemini agent endpoints The list_gemini_agents, get_gemini_agent, delete_gemini_agent, and list_gemini_agent_versions endpoints previously constructed a hardcoded data dict with no mechanism to pass provider credentials. Unlike create_gemini_agent (POST, reads litellm_params_template from body), these GET/DELETE endpoints gave no way for multi-tenant callers to supply a per-request api_key or other LiteLLM params. Fix: - Add _merge_query_params_into_data() helper that reads query parameters from the request and merges them into the data dict without overwriting already-set keys (e.g. path params like 'name'). - Support a JSON-encoded litellm_params_template query parameter (matching the POST body pattern) as well as flat key=value pairs (e.g. api_key=AIza...). - Apply the helper in all four affected endpoints. - Add 13 unit tests covering the helper and each endpoint. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix: pass model=None for managed agent proxy endpoints to prevent agent name polluting data["model"] Endpoints acreate_agent, aget_agent, adelete_agent, and alist_agent_versions were passing model=<agent_name> to base_process_llm_request. This caused common_processing_pre_call_logic to write the agent name into self.data["model"], which then triggered spurious model-alias mapping, rate-limiting lookups, and logging tied to a non-existent model deployment. The agent name is already carried in data["name"] and is passed correctly to the SDK functions (litellm.interactions.agents.). There is no reason to also set model=<agent_name>; the correct value is model=None for all five managed-agent management routes. Adds tests/test_litellm/proxy/google_endpoints/test_managed_agents_model_param.py to verify all five managed-agent endpoints pass model=None. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> fix: address greptile P1/P2 review comments P1 (router.py): Restore fallback/retry support for acreate_interaction and create_interaction. Both were silently moved to _init_interactions_api_endpoints (direct call, no fallbacks). Moved them back to _ageneric_api_call_with_fallbacks so users with configured fallback models keep retry behaviour. P1 security (agents_endpoints.py): Remove flat query-param credential path (e.g. ?api_key=AIza...) from _merge_query_params_into_data. Credentials in URL query strings appear verbatim in server access logs, CDN edge logs, and browser history. Only the JSON-encoded litellm_params_template query param (matching the POST body pattern) is retained. P2 (interactions/http_handler.py): Extract _BaseHTTPHandler with shared _handle_error, _sync_client, and _async_client helpers. InteractionsHTTPHandler now extends _BaseHTTPHandler. The _async_client reads the provider from litellm_params instead of hardcoding GEMINI. P2 (interactions/agents/http_handler.py): AgentsHTTPHandler now extends InteractionsHTTPHandler (which inherits _BaseHTTPHandler) so all shared HTTP infrastructure is reused rather than duplicated. Removes the hardcoded LlmProviders.GEMINI from the async client path. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: address CI failures from greptile review fixes - black: format interactions/agents/main.py and utils.py - tests: update test_gemini_agents_endpoints.py to match new _merge_query_params_into_data behaviour (flat credential params are rejected; only JSON-encoded litellm_params_template is accepted) - ci: add test_gemini_agents_endpoints.py to endpoints-and-responses shard in test-unit-proxy-db.yml so assert-shard-coverage passes - tests: add _initialize_managed_agents_endpoints and _init_managed_agents_api_endpoints test coverage so router_code_coverage passes; also fix TestRouterCreateInteractionRouting to reflect that acreate_interaction now correctly routes through _ageneric_api_call_with_fallbacks (restoring fallback support) Co-authored-by: Cursor <cursoragent@cursor.com> * fix: remove InteractionsHTTPHandler._handle_error override to fix type errors AgentsHTTPHandler extends InteractionsHTTPHandler and calls self._handle_error(provider_config=agents_api_config) where agents_api_config is BaseAgentsAPIConfig. Python MRO resolved _handle_error to InteractionsHTTPHandler._handle_error which expected BaseInteractionsAPIConfig, causing 10 mypy arg-type errors in interactions/agents/http_handler.py. Removing the redundant override lets both classes inherit _BaseHTTPHandler._handle_error (provider_config: Any) which is structurally correct for both config types. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: agent-only interactions and managed agents provider routing Resolve None custom_llm_provider in agents HTTP client lookup and set custom_llm_provider on GenericLiteLLMParams for all agent CRUD paths. Stop mapping agent names to proxy model routing; route interactions through _init_interactions_api_endpoints with fallbacks only when model is set. Consolidate duplicate router elif branches for interaction APIs. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix greptile review * test(agents): add unit tests for managed agents SDK and HTTP handler Adds coverage for the new `litellm.interactions.agents` surface area: - main.py: sync/async entry points (create/list/get/delete/list_versions), provider config lookup, logging-obj helper, async error wrapping - http_handler.py: every CRUD method (sync + async paths), `_is_async` dispatch branches, and provider error mapping through GeminiAgentsConfig - utils.py: get_provider_agents_api_config for supported / unsupported providers Brings patch coverage on these files from <25% to ~100% so codecov/patch is satisfied. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * docs(gemini-agents): fix misleading credential-passing examples in GET/DELETE docstrings (#28293) The four GET/DELETE endpoint docstrings (list_gemini_agents, get_gemini_agent, delete_gemini_agent, list_gemini_agent_versions) documented passing per-request credentials as flat query parameters (e.g. ?api_key=AIza...). However, _merge_query_params_into_data only reads the JSON-encoded litellm_params_template query parameter and intentionally ignores flat params (URL query strings appear verbatim in access logs, browser history, and Referer headers). Callers following the documented curl examples would have their credentials silently dropped and hit auth failures against Gemini. Update the examples to use the supported JSON-encoded litellm_params_template query parameter, matching _merge_query_params_into_data's own docstring. Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * refactor(agents): rename provider-agnostic agent response types Move GeminiAgent{ListResponse,DeleteResult,VersionsResponse} to provider-neutral names (AgentListResponse, AgentDeleteResult, AgentVersionsResponse) so the BaseAgentsAPIConfig interface no longer references Gemini-specific type names. * fix(gemini-agents): close veria-flagged credential-escalation gaps Two high-severity findings from the veria-ai PR review are addressed: 1. api_base override could leak the shared Gemini key GeminiAgentsConfig.validate_environment falls back to GOOGLE_API_KEY / GEMINI_API_KEY when no api_key is supplied. Combined with caller-controlled api_base on the proxy CRUD endpoints, an authenticated user could redirect the outbound request to an attacker-controlled host and capture the operator's shared Gemini key from the x-goog-api-key header. The config now refuses env-fallback whenever api_base is explicitly overridden. 2. Managed-agent CRUD exposed to ordinary LLM keys The new /v1beta/agents routes live in google_routes (i.e. llm_api_routes), so any non-admin LLM key can reach them. Unlike /v1beta/models/...: generateContent these endpoints are NOT model-routed and have no model_list-supplied credentials, so env-fallback would let any LLM key list / create / delete agents inside the operator's Gemini project. Each endpoint now calls _enforce_caller_supplied_provider_key, which requires non-admin callers to supply their own Gemini api_key via litellm_params_template. Proxy admins keep the env-fallback convenience. Tests cover non-admin rejection, admin allow-through, the api_base override guard, and SDK env-fallback when api_base is not overridden. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(router): restore strict assert_called_once_with on interactions default-provider test --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * feat(gemini): add gemini-3.1-flash-lite model cost map (#28320) * feat(gemini): add gemini-3.1-flash-lite model cost map entries Co-authored-by: Cursor <cursoragent@cursor.com> * Update model_prices_and_context_window.json * Update source URL for model pricing information * Sync source URL for gemini-3.1-flash-lite in backup JSON * fix(model_cost_map): add mistral/ministral-8b-2512 entry Mistral rotated the 'mistral/mistral-tiny' alias to return 'ministral-8b-2512' as the response model, which is not in the cost map. This caused test_completion_mistral_api and test_completion_mistral_api_modified_input to fail in completion_cost lookup. Add the entry mirroring the existing openrouter/mistralai/ministral-8b-2512 pricing. * test(cost_calculator): assert output_cost_per_reasoning_token for gemini-3.1-flash-lite * fix(tests): backfill local backup entries into runtime model_cost litellm.model_cost is loaded from LITELLM_MODEL_COST_MAP_URL (pinned to main) at import time, so any pricing entries added to the in-tree backup on this branch aren't visible at test runtime until they also land on main. The Mistral cassette currently returns model=ministral-8b-2512 and the cost-calculator lookup in test_completion_mistral_api / test_completion_mistral_api_modified_input fails despite the entry existing in the local backup. Backfill missing backup entries into litellm.model_cost in the local_testing conftest so these lookups succeed against the cassette state the branch is being tested with. * fix(tests): guard conftest backfill against empty local cost map --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed (#27854) * fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed Symptom ------- Customers on multi-pod deployments see team `spend` jump to ~2x (or N x the pod count) shortly after a Redis cache miss / TTL expiry, triggering spurious "Budget Crossed" alerts and blocked requests until the value is manually reset. Root cause ---------- `SpendCounterReseed.coalesced` warmed the primary spend counter by calling `redis.async_increment(key, value=db_spend, refresh_ttl=True)`, which lowers to Redis `INCRBYFLOAT`. That is additive, not idempotent. The per-counter `asyncio.Lock` only coalesces seeders inside one process. With N pods sharing one Redis, on a cold key (cold start, TTL expiry, manual delete) every pod independently passes its lock + Redis re-check, reads the same `db_spend`, and issues `INCRBYFLOAT db_spend`. Final value: N x db_spend. Fix --- Use `redis.async_set_cache(key, value=db_spend, nx=True)` for the seed. SET NX is atomic across pods: exactly one writer initializes the key; losers read the winner's value via `async_get_cache`. This is the same idiom already used by `coalesced_window` in the same file, so the two seed paths are now consistent. Per-request deltas continue to use `INCRBYFLOAT` (correct - additive behaviour is what we want for increments, not for initial seed). Verification ------------ Live two-process repro against the same Postgres + Redis (DB spend = 506): Unpatched: 4/4 runs -> Redis counter = ~1012 (~2 x db_spend) Patched: 12/12 runs -> Redis counter = ~506 Unit tests (`test_proxy_server.py`): - New `test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed` patches `_get_lock` to return a fresh lock per caller (otherwise the per-process lock masks the race), races two `coalesced` calls, and asserts final = 506 with exactly one of two SET NX attempts winning. - 4 existing tests updated for the new seed contract (SET NX for the seed, INCRBYFLOAT only for the per-request delta). - Full `spend_counter or reseed or budget` slice: 22 passed. Co-authored-by: Cursor <cursoragent@cursor.com> * test(spend_counter): make SET NX mock atomic so loser branch is exercised Greptile flagged that `redis_set_cache` in test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed placed `await asyncio.sleep(0)` AFTER the NX membership check. Both concurrent tasks observed an empty `redis_store`, passed the guard, and both returned True - so the loser branch (else: read back winner's value) was never exercised. Fix the mock to model real atomic Redis SET NX: - Yield BEFORE the membership check so two concurrent callers interleave the way real SET NX does (first to resume runs check + write atomically and wins; second resumes after the key exists and loses). - Track set_cache return values; assert sorted([loser, winner]) so we know exactly one task wins and one loses. - Track async_get_cache calls that happen AFTER at least one SET NX has completed; assert at least one such read - that is the loser-path fallback (`current_value = float(cached)` when seeded is False). Verified by temporarily reverting the mock to the old order: the test now fails with `expected exactly one SET NX winner and one loser, got [True, True]`, exactly the failure mode Greptile described. No production code change. Co-authored-by: Cursor <cursoragent@cursor.com> * test(spend_counter): mock async_set_cache to populate redis_store in concurrent read+write test `test_concurrent_read_and_write_paths_share_one_db_query` mocks `async_increment` to populate the in-memory `redis_store`, but did not mock `async_set_cache`. After the SET-NX seed change in `coalesced()`, the seed step writes via `async_set_cache(nx=True)` (default AsyncMock, no `redis_store` write), so the simulated Redis stays empty after the first reseed. The second `get_current_spend` then sees a clean Redis miss, re-enters the DB read path, and the test fails with `expected 1 DB query, got 2`. Fix: add a `redis_set_cache` side_effect that updates `redis_store` on `nx=True` (and rejects when the key already exists), matching the pattern used by the four sibling tests fixed in this branch's first commit. Pre-existing assertions are unchanged. Full `tests/test_litellm/proxy/test_proxy_server.py`: 158 passed. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): normalize batch file IDs before ManagedObjectTable write (#28339) * fix(proxy): normalize batch file IDs before ManagedObjectTable write Run post_call_success_hook before update_batch_in_database on retrieve/cancel, and ensure_batch_response_managed_file_ids so file_object never stores raw provider output_file_id or error_file_id. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): address Greptile review on batch file ID normalization Remove redundant resolve_* calls after update_batch_in_database and rename loop variable to avoid shadowing hidden_params unified_file_id. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest Mistral rotated the 'mistral/mistral-tiny' alias to return 'ministral-8b-2512' as the response model, which was missing from the cost map. This caused test_completion_mistral_api and test_completion_mistral_api_modified_input to fail in litellm.completion_cost lookup. - Add mistral/ministral-8b-2512 entry to both the in-tree model_prices_and_context_window.json and the bundled litellm/model_prices_and_context_window_backup.json (mirrors the existing openrouter/mistralai/ministral-8b-2512 pricing). - litellm.model_cost is loaded at import time from the URL pinned to main, so the new backup entry isn't visible at test runtime until it also lands on main. Backfill any entries missing from the remote-fetched map into litellm.model_cost in the local_testing conftest so cost-calculator lookups succeed on this branch. * fix(tests): drop unnecessary del of conftest backfill loop vars * fix: resolve batch response file IDs even when status unchanged The status-unchanged early return in update_batch_in_database was skipping ensure_batch_response_managed_file_ids, leaving raw provider input_file_id (and other raw IDs) in the user-facing response when polling an in-progress batch. Move the in-place file ID normalization above the early return so the response always carries unified managed IDs while still skipping the DB write when nothing changed. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(batches): cover ensure_batch_response_managed_file_ids branches Add tests for the previously-uncovered paths in ensure_batch_response_managed_file_ids: error_file_id normalization, swallowed conversion errors, UserAPIKeyAuth fallback from db_batch_object, model_name resolution from unified_file_id, and early returns when managed_files_obj, model_id, or auth context are missing. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Claude <noreply@anthropic.com> * fix(router): use forwarded model_id for native Azure container IDs (#27921) * fix(router): use forwarded model_id for native Azure container IDs in _init_containers_api_endpoints Azure code-interpreter containers return provider-native IDs (cntr_ + hex) that carry no LiteLLM routing payload, so _decode_container_id returns model_id=None. The router was falling through to call the handler directly, bypassing _ageneric_api_call_with_fallbacks and leaving api_base=None for Azure deployments. Fall back to the model_id forwarded from the proxy ownership check so deployment credentials are always applied. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(azure-containers): strip /openai/responses path from api_base in AzureContainerConfig.get_complete_url When a deployment's api_base is the responses endpoint URL (e.g. .../openai/responses?api-version=...), AzureContainerConfig was appending /openai/containers on top of it, producing the broken path .../openai/responses/openai/containers. Azure returns 404 for that URL while the correct path is .../openai/containers. Strip any /openai/responses suffix from api_base before constructing the containers URL so the resource root is always used as the starting point. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(azure-containers): prefer api-version from api_base URL over deployment's api_version The deployment's api_version (e.g. 2024-08-01-preview) targets the chat/responses API and is too old for the containers API, which requires 2025-04-01-preview. The responses endpoint api_base already carries the correct api-version in its query string. Extract it and use it for the containers URL, overriding the stale deployment-level version. Fixes DELETE and file-upload operations returning 404 due to wrong api-version. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(containers): pass params=None instead of params={} to httpx to preserve api-version httpx erases a URL's query-string when params={} (empty dict) is passed, silently stripping ?api-version=2025-04-01-preview from every container POST/DELETE request. Azure's GET endpoints tolerate a missing api-version; POST (upload) and DELETE are strict, so those returned 404. Fix: use `params or None` in container_handler._async_handle and llm_http_handler.async_container_delete_handler (and all sibling container handlers) so that an empty params dict falls back to None, leaving httpx to preserve the URL's existing query string intact. Adds a regression test that directly documents the httpx behaviour. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(router): remove elif model_id branch from _init_containers_api_endpoints Two reviewer findings addressed: 1. Truncated comment on the model_id fallback line — now complete. 2. Security: the elif branch that fired when container_id was absent allowed any authenticated caller to supply model_id in a POST /v1/containers body and route the request through an arbitrary deployment UUID, bypassing the model-level access checks that only validate `model`. Removed the elif branch; operations without container_id (create, list) route by the caller-supplied `model` field as before. model_id forwarding is kept only inside the container_id block, where the proxy ownership check has already validated the container before forwarding the deployment ID. Adds a regression test pinning the security boundary: no-container-id path calls original_function directly even when model_id is in kwargs. Co-authored-by: Cursor <cursoragent@cursor.com> * test(containers): validate proxy-to-router model_id forwarding for managed IDs Add test_regression_get_container_forwarding_params_sets_model_id_for_managed_id to verify that get_container_forwarding_params (the proxy-side half of the Azure routing fix) correctly extracts and forwards model_id from a LiteLLM-managed encoded container ID. This closes the gap identified by Greptile P1: the previous regression test only injected model_id as a direct kwarg, validating the router in isolation. The new test exercises the actual proxy-to-router data flow through ownership.get_container_forwarding_params, confirming that kwargs["model_id"] is populated before _init_containers_api_endpoints is reached. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(azure-containers): tighten endpoint-path strip to endswith match Use path.endswith() instead of path.find() for _AZURE_ENDPOINT_PATHS so the suffix strip only fires when api_base actually ends with one of the endpoint-specific path suffixes. This is the more precise check greptile flagged on the original find()-based implementation. * Fix sync container handler to preserve URL query string Mirror the async path fix: pass None instead of an empty params dict so httpx does not strip the URL's existing query string (e.g. ?api-version=...), which is required for Azure container routing. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(azure-containers): strip trailing slash before endpoint suffix match Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(containers): recover model_id from stored encoded id for native Azure container IDs get_container_forwarding_params previously only set model_id when the user-supplied container_id was a LiteLLM-managed encoded id. For native upstream IDs (e.g. Azure 'cntr_<hex>') the decode fails and model_id was never forwarded — making the router-side fallback in _init_containers_api_endpoints unreachable in production. Fall back to the stored 'unified_object_id' on the ownership row, which is the encoded form captured at create time when the router selected a specific deployment. Decoding that yields the deployment model_id and restores router-based credential application (api_base, api_key) for retrieve/delete and container-file operations on native IDs. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(ui): restore log filter loading indicator (#28282) When a new filter is applied to spend logs, React Query's keepPreviousData left stale rows on screen for 10–15s with no indication that a fetch was in progress. The previous custom isFilteringResults flag was removed in the #25847 toolbar refactor and only partially restored on the Fetch button. Use React Query's isPlaceholderData to discriminate a real filter change (queryKey changed, data not yet arrived) from a same-key live-tail refetch, and feed it into the existing isLoading prop on the toolbar pagination text and the table body. Live-tail polls still keep previous rows without flicker. Co-authored-by: Ryan <ryan@Ryans-MBP.localdomain> * test(e2e): migrate runner to uv, add All Proxy Models key test (#28313) * chore(e2e): migrate runner to uv, add All Proxy Models key test Switches the local e2e runner (run_e2e.sh) from poetry to uv to match the rest of the repo and CI. Adds a Playwright test for creating an admin key with no team selected (all-proxy-models flow), a SLOWMO env hook for headed debugging, and a MIGRATION_TRACKING.md doc that maps the manual UI QA checklist to e2e tests so future migration work has a single source of truth. * chore(e2e): address greptile feedback - Remove MIGRATION_TRACKING.md (docs belong in litellm-docs repo) - playwright.config.ts: fall back to 0 when SLOWMO is non-numeric (parseInt returns NaN, which Playwright accepts silently) - run_e2e.sh: add --frozen to uv sync for CI determinism * feat(ui): team passthrough routes create parity + edit load fix (#28098) * feat(ui): team allowed_passthrough_routes create parity + edit load fix Add the Allowed Pass Through Routes selector to the create-team modal (previously only on the edit form), and fix the edit form silently dropping the field: it lives under team metadata, so initialValues must read info.metadata.allowed_passthrough_routes — otherwise the selector renders empty and saving wipes admin-set routes. Both selectors are gated to premium proxy admins, mirroring the server-side gate. Resolves LIT-3019 * fix(ui): persist team allowed_passthrough_routes edits on save The edit form loaded the selector but the save path never wrote it back: allowed_passthrough_routes stayed in the raw metadata JSON textarea and parsedMetadata (from that textarea) always won, so selector edits were silently discarded. Strip it from the textarea initialValues and overlay values.allowed_passthrough_routes into updateData.metadata, mirroring how guardrails is handled. Resolves LIT-3019 * fix(ui): preserve team passthrough routes for non-proxy-admins on save Only proxy admins may set allowed_passthrough_routes (server-side gate). For non-proxy-admins, write the team's stored value back into metadata instead of the form value, so saving an unrelated setting can't silently wipe routes; omit the key entirely when the team never had any. Resolves LIT-3019 * fix(mcp): JWT on tools/list and REST tools/call server resolution (#28227) * fix(mcp): JWT on tools/list, REST server_id resolution, tool_server_mismatch Sign outbound MCP JWTs for list_mcp_tools and inject headers on the tools/list path. Resolve server_id on /mcp-rest/tools/call and return 403 tool_server_mismatch when the tool does not belong to the requested server. Default missing arguments to {}. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): restrict list JWTs to mcp:tools/list and default REST arguments to {} - List-only JWTs (call_type=list_mcp_tools) no longer carry the broad mcp:tools/call scope. _build_scope() now emits only mcp:tools/list when no tool name is provided, mirroring the existing least-privilege rule that tool-call JWTs omit mcp:tools/list. - REST /tools/call now defaults a missing 'arguments' field to {} so execute_mcp_tool() and downstream *arguments / .keys() calls don't receive None and crash with TypeError/AttributeError. Co-authored-by: Yassin Kortam <yassin@berri.ai> fix(mcp): validate tool/server in call_tool; skip JWT signer when not configured or static auth present Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): align tests and mypy with user_api_key_auth on tools/list Update mocks for the new _get_tools_from_server parameter, mock server registry in REST access-denied test, and narrow static_headers for mypy. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(test): accept user_api_key_auth in get_tools_from_mcp_servers mock The side_effect for the all-servers case did not accept the new kwarg, so tools/list returned an empty list. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): fail fast for unknown tools when server mapping exists Server-name fallback in call_tool must not open an upstream session when the tool is absent from a populated mapping. Update the HTTP transport test to register a known tool before asserting not-found behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * fix mypy * Fix mypy * fix(mcp): preserve tools/call scope on missing tool name; pass user_api_key_auth in list_tools Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): match alias/server_name in _resolve_mcp_server_for_tool_call The registry lookup in _resolve_mcp_server_for_tool_call previously only compared candidate.name against the provided server_name, but tool name prefixes can be derived from a server's alias or server_name (see get_server_prefix). When the tool→server mapping is empty/stale (cold start, dynamic tools), the lookup would fail for alias-configured servers even though get_mcp_server_by_name (used by the REST path) matches alias, server_name, and name. Match the same priority of identifiers in both the registry pass and the unprefixed fallback so the MCP protocol call_tool path is consistent with the REST path. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): reuse proxy_logging DualCache in inject_mcp_jwt_headers_for_upstream Instead of allocating a fresh DualCache() on every tools/list invocation, prefer the shared proxy_logging_obj.internal_usage_cache.dual_cache when available. The cache argument is currently unused by MCPJWTSigner, but sharing the proxy's cache avoids per-call allocation overhead and matches the cache identity used elsewhere in the proxy hook plumbing — so any future per-request state stored in cache will survive across list calls. Co-authored-by: Claude <noreply@anthropic.com> * fix(mcp): return 403 ip_filtering for IP-restricted servers in tools/call name lookup Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(test): accept user_api_key_auth kwarg in list_tools mocks The proxy-infra job was failing on four TestMCPServerManager tests because the mock_get_tools_from_server stubs did not accept the new user_api_key_auth keyword argument that list_tools now forwards to _get_tools_from_server. Add the kwarg to each stub so list_tools can call through cleanly. Co-authored-by: Claude <claude@anthropic.com> * fix(mcp): skip JWT injection when per-user mcp_auth_header is set MCPClient._get_auth_headers() applies extra_headers AFTER writing Authorization from auth_value, so an injected JWT silently overwrites the user's per-server OAuth token. Guard the JWT signer with 'not mcp_auth_header' so per-user OAuth (and any dict-form per-user auth) takes precedence, mirroring the existing static_headers guard. Adds a regression test that the signer's inject helper is not called when mcp_auth_header is supplied. * fix(mcp): skip JWT injection when extra_headers already has Authorization When a server uses per-user OAuth tokens, the resolved token is passed into _get_tools_from_server via extra_headers. The JWT injection guard only checked mcp_auth_header and the server's static headers, so the signer would silently overwrite the user's OAuth Authorization header. Add a check for an existing Authorization entry in extra_headers so caller-supplied per-user OAuth tokens take precedence over JWT signing. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(mcp): cover JWT signer + tool-call resolution branches Adds unit tests for the new MCPServerManager helpers (_resolve_mcp_server_for_tool_call, _resolve_oauth2_headers_for_tool_call) and the new MCPJWTSigner paths (_build_scope call_type branches and inject_mcp_jwt_headers_for_upstream). Brings patch coverage above the auto target without changing behavior. Co-authored-by: Claude <claude@anthropic.com> * fix(mcp): retry tool-server lookup with prefixed name in REST mismatch check When the REST /mcp-rest/tools/call path sends a raw tool name plus requested_server_id, _get_mcp_server_from_tool_name(name) can return None if the mapping only stores the prefixed form. That bypassed the tool_server_mismatch 403 guard and let the call fall through to trusting requested_server. Retry the lookup with every known prefix of the requested server so the mismatch check fires whenever the tool is actually registered. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): always reject unknown tools in server-name fallback Defense-in-depth: _resolve_mcp_server_for_tool_call previously skipped the unknown-tool check whenever the per-server mapping had no entries yet (cold start, OAuth2 lazy listing, or upstream listing failure), allowing arbitrary tool names to reach upstream servers. Tighten the check so the server-name fallback always rejects tool names not present in the mapping. Callers must call list_tools first (standard MCP flow) before tools/call can resolve. Removes the now-unused _mapping_has_tools_for_server helper and adds an explicit empty-mapping rejection test alongside the existing populated-mapping rejection test. Co-authored-by: Sameer Kankute <sameer@berri.ai> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Claude (greptile subagent) <claude-greptile-bot@anthropic.com> * feat(interactions): migrate to Google Interactions API steps schema (May 2026) (#28153) * feat(interactions): migrate to Google Interactions API steps schema (May 2026) Default to Api-Revision: 2026-05-20 (new `steps` schema). Add `litellm.use_legacy_interactions_schema` global flag that sends Api-Revision: 2026-05-07 for operators who need the legacy `outputs` schema until June 8, 2026. - Inject Api-Revision header in GoogleAIStudioInteractionsConfig.validate_environment() - Auto-coalesce response_mime_type → response_format and image_config migration on new schema - Add steps field to InteractionsAPIResponse and InteractionsAPIStreamingResponse - Add StepStart/StepDelta/StepStop/InteractionCreated/etc. SSE event types - Update streaming completion detection to handle interaction.completed event - Bridge transformer populates both outputs and steps fields - Bridge streaming iterator emits new-schema events by default Co-authored-by: Cursor <cursoragent@cursor.com> * fix(interactions): address greptile review feedback - Avoid mutating caller's generation_config dict by shallow-copying before popping image_config, preventing silent failures on retries - Skip schema key in response_format when response_format is None to avoid sending schema: null to the Google Interactions API - Remove delta field from step.stop events (new schema only); the StepStop model has no delta field and sending it duplicates already- streamed text and breaks spec-conformant clients Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): parse use_legacy_interactions_schema string values safely bool("false") returns True in Python, so quoted YAML values like "false" or "False" silently activated the legacy Interactions API schema. Match the env-var parsing pattern in litellm/__init__.py by treating string inputs as true only when they equal "true" (case insensitive). Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(interactions): only set object/id/delta on step.stop for legacy schema StepStop (new schema) has no object, id, or delta fields. Setting them unconditionally caused spec-breaking extra fields on new-schema step.stop events in all four construction sites (sync/async × main-loop/StopIteration). Legacy content.stop still receives id, object, and delta unchanged. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(interactions): stabilize streaming bridge schema, dict aliasing, and lost first delta - Capture use_legacy_interactions_schema once at iterator construction so all events emitted by a single stream use a consistent schema, even if the global flag is mutated mid-stream. - Check for the buffered interaction.complete/completed event before the finished check in __next__/__anext__ so the final completion event (which carries the full collected text in steps) is not dropped after self.finished is set. - Copy text content entries before appending to both outputs and the steps content list to avoid shared mutable dict aliasing between the two response fields. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix tests * fix greptile review * fix(interactions): address Greptile P1 review on schema coalescing and legacy deltas Skip response_mime_type merge when response_format is already a list, avoid in-place list mutation on image_config append, and restore delta.type on legacy content.delta events. Co-authored-by: Cursor <cursoragent@cursor.com> * style(interactions): black-format gemini transformation.py Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Claude <noreply@anthropic.com> * test(ui-e2e): admin key creation with a specific proxy model (#28365) * test(ui-e2e): add admin key creation with a specific proxy model Adds Playwright coverage for creating a key (no team) scoped to a single proxy model, complementing the existing All-Proxy-Models test. Uses a DOM-dispatched click on the antd dropdown option since the popup animation can render the option outside the viewport. * test(ui-e2e): verify scoped key works against mock /chat/completions Extend the "Create a key with a specific proxy model" test to extract the new key from the success modal and POST to /chat/completions for the scoped model, asserting 200 and the mock response body. Without this the test could pass even if the model selection failed to register. * fix(vertex_ai): omit function_call id on Vertex Gemini 3.5+ tool turns (#28324) * fix(vertex_ai): omit function_call id on Vertex Gemini 3.5+ tool turns Vertex AI rejects `id` on function_call/function_response parts; only Google AI Studio accepts it for Gemini 3.5+ strict tool matching. Co-authored-by: Cursor <cursoragent@cursor.com> * Update litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(vertex_ai): forward custom_llm_provider in context caching Pass custom_llm_provider through to _gemini_convert_messages_with_history in the context caching path so Gemini 3.5+ tool-call `id` forwarding behaves consistently between cached and non-cached completions on Google AI Studio. Co-authored-by: Claude <claude@anthropic.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <claude@anthropic.com> * feat(mcp): allow native MCP OAuth support for cursor (#28327) * feat(mcp): allow native MCP OAuth redirect URIs (cursor://) Discoverable OAuth /authorize rejected cursor:// callbacks because validate_trusted_redirect_uri only accepted http/https. Add an allowlisted native path with a built-in Cursor default and optional MCP_TRUSTED_NATIVE_REDIRECT_URIS env for other clients. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): address Greptile native redirect URI review Lowercase paths in normalizer so env allowlist entries match case- insensitively. Tighten wildcard prefix matching to reject sibling paths (e.g. callback-2) unless the prefix ends with /. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): reject query params on native OAuth redirect URIs Greptile: normalization stripped query strings before allowlist compare, so cursor://.../callback?injected=... could pass validation. Reject any native redirect_uri with a query component (same as fragments). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(model_cost_map): add mistral/ministral-8b-2512 entry Mistral rotated the 'mistral/mistral-tiny' alias to return 'ministral-8b-2512' as the response model, which is not in the cost map. This caused test_completion_mistral_api and test_completion_mistral_api_modified_input to fail in completion_cost lookup. Add the entry mirroring the existing openrouter/mistralai/ministral-8b-2512 pricing. * fix(mcp): lowercase default native redirect URIs Make _parse_trusted_native_redirect_uris apply the same lowercasing to built-in defaults as it does to env-var entries. * fix(tests): backfill local model_cost into remote-fetched map litellm.model_cost is loaded at import time from the URL pinned to main, so pricing entries that exist only in this branch (e.g. mistral/ministral-8b-2512, freshly added because Mistral now returns this id from mistral-tiny) are absent at test time and completion_cost lookups raise. Backfill the in-tree backup so cassette-driven cost calculations resolve against the entries that ship with the branch under test. Fixes the local_testing_part1 failures on test_completion_mistral_api and test_completion_mistral_api_modified_input. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com> * fix(interactions): never drop streamed text deltas; always emit terminal completion (#28394) * fix(interactions): never drop streamed text deltas; always emit terminal completion The interactions streaming bridge had two bugs flagged by Greptile on PR #28153: 1. The first OutputTextDeltaEvent (and the second, when no ResponseCreatedEvent precedes the deltas) was consumed to emit a synthetic interaction.created / step.start event, but the chunk's text payload was never forwarded as a step.delta. The text only reappeared in the terminal step.stop, which defeats the purpose of incremental streaming. 2. When the upstream Responses API stream ended via StopIteration without a ResponseCompletedEvent, the iterator emitted step.stop but never the terminal interaction.completed event carrying the full collected text. This refactors the iterator to translate each upstream chunk into a list of events (instead of a single event) and buffers them in a deque. A text delta now expands into [interaction.created, step.start, step.delta] on the first chunk so no token is dropped, and the StopIteration / StopAsyncIteration fallback always flushes a terminal interaction.completed event when one hasn't already been sent. Both behaviors are covered by new unit tests: - test_no_text_token_is_dropped_during_streaming - test_response_created_then_text_delta_emits_step_start_and_delta - test_stop_iteration_fallback_emits_completion_event - test_response_completed_emits_stop_then_completion (no double-emit) Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(interactions): correlate EOF terminal events with stream's interaction id The StopIteration fallback path previously built the terminal step.stop / interaction.completed events with id=None (legacy content.stop) and a memory-address fallback string (interaction.completed), neither of which matched the item_id used by the earlier interaction.created / step.start / step.delta events in the same stream. Downstream consumers correlating events by id would see a mismatch. Persist the interaction id derived from the first upstream chunk (item_id on an OutputTextDeltaEvent, or response.id on a ResponseCreatedEvent) and reuse it when flushing the terminal events on EOF. Author: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * ci(windows): raise UV_HTTP_TIMEOUT to 300s for uv sync The using_litellm_on_windows job has been hitting flaky PyPI download timeouts during 'uv sync --frozen --group dev' — different packages on each rerun (six, pydantic-core), all surfacing the same uv error: Failed to download distribution due to network timeout. Try increasing UV_HTTP_TIMEOUT (current value: 30s). uv's default 30s per-request timeout is too tight for the Windows runner on this project (50+ deps, several multi-MB wheels), so bump it to 300s to let slow individual downloads complete instead of failing the build. * fix(interactions): correlate ResponseCompletedEvent terminal events with stream's interaction id When a stream starts directly with OutputTextDeltaEvent (no preceding ResponseCreatedEvent), interaction.created carries item_id while interaction.completed previously carried response.id from ResponseCompletedEvent. The two ids can differ, leaving consumers that correlate events by id unable to match the start and completion events. Fall back to self._interaction_id (set on the first chunk that derives an id) before response.id, mirroring the EOF terminal path. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(proxy): expose Prisma idle/connect timeout + extra DB URL params (#28395) * fix(proxy): expose Prisma idle/connect timeout + extra DB URL params Operators have reported large numbers of idle Prisma connections that never get closed. The proxy already forwards `connection_limit` and `pool_timeout` to the DATABASE_URL, but had no knob for capping idle or slow connections. Add three new `general_settings` keys that thread through to the DATABASE_URL / DIRECT_URL query string: - `database_connect_timeout` -> Prisma `connect_timeout` - `database_socket_timeout` -> Prisma `socket_timeout` (the main knob for closing idle connections from the LiteLLM side) - `database_extra_connection_params` -> untyped passthrough dict for any other Prisma URL param (`pgbouncer`, `statement_cache_size`, `sslmode`, ...); keys here override LiteLLM defaults. Refactors the duplicated DATABASE_URL/DIRECT_URL param dicts into a single `_build_db_connection_url_params` helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update litellm/proxy/proxy_cli.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Litellm oss staging 1 (#28337) * feat: add Xiaomi MiMo-V2.5-Pro and MiMo-V2.5 OpenRouter model entries (#27700) Squash-merged by litellm-agent from TorvaldUtne's PR. * fix(ui): trim whitespace from MCP inspector tool call inputs (#28203) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * gemini-3.1-flash-lite pricing (#27933) * feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> * fix: incorrect /v1/agents request example (#28131) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge (#28201) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks). Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks. Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash). * test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models. * test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop). * feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite (#28280) Squash-merged by litellm-agent from ro31337's PR. * fix(router): wrap aresponses streaming iterator for mid-stream fallbacks (#28215) Squash-merged by litellm-agent from cwang-otto's PR. * fix(router): unblock staging — mypy + coverage for aresponses streaming fallback (#28318) Squash-merged by litellm-agent from cwang-otto's PR. * fix(responses): forward timeout on completion transformation path (Anthropic, Bedrock, Vertex) (#28133) Squash-merged by litellm-agent from cwang-otto's PR. * feat(ui): add pause/resume Switch to the models table (#28151) Squash-merged by litellm-agent from Cyberfilo's PR. * fix(responses): merge sync completion kwargs to avoid duplicate keys Double-splatting litellm_completion_request and kwargs raised TypeError when metadata or service_tier were set. Match the async merge pattern. Co-authored-by: Cursor <cursoragent@cursor.com> * Use proxy base URL for CLI SSO form action (#28271) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest Mistral rotated the 'mistral/mistral-tiny' alias to return 'ministral-8b-2512' as the response model, which was missing from the cost map. This caused test_completion_mistral_api and test_completion_mistral_api_modified_input to fail in litellm.completion_cost lookup. - Add mistral/ministral-8b-2512 entry to both the in-tree model_prices_and_context_window.json and the bundled litellm/model_prices_and_context_window_backup.json (mirrors the existing openrouter/mistralai/ministral-8b-2512 pricing). - litellm.model_cost is loaded at import time from the URL pinned to main, so the new backup entry isn't visible at test runtime until it also lands on main. Backfill any entries missing from the remote-fetched map into litellm.model_cost in the local_testing conftest so cost-calculator lookups succeed on this branch. * fix(tests): drop unnecessary del of conftest backfill loop vars * fix(router): harden streaming fallback wrapper for bridge iterators - FallbackResponsesStreamWrapper now uses getattr fallbacks when copying attributes from the source iterator. The bridge path (LiteLLMCompletionStreamingIterator used by Anthropic/Bedrock/Vertex) does not call super().__init__ and is missing response, logging_obj (it uses litellm_logging_obj), responses_api_provider_config, start_time, request_data, call_type, and _hidden_params. Previously, wrapper construction raised AttributeError for any streaming fallback on the bridge path. - _aresponses_with_streaming_fallbacks now deep-copies the litellm_metadata (and metadata) dicts into fallback_kwargs. The primary attempt mutates this dict in place via _update_kwargs_with_deployment, so a shallow copy of kwargs was leaking primary-deployment fields (deployment, model_info, api_base) into the mid-stream fallback request. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(router): use safe_deep_copy for fallback metadata snapshot The ban_copy_deepcopy_kwargs CI check rejects copy.deepcopy() on any variable whose name contains 'kwargs' (incl. fallback_kwargs). Swap the two copy.deepcopy(fallback_kwargs[...]) calls for safe_deep_copy, which handles non-picklable values (OTEL spans, etc.) by per-key deepcopy with fallback to the original reference. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(ci): skip chronically flaky build_and_test integration tests Both tests have been failing on every recent run of build_and_test against this PR's HEAD (1686967, 1688402, 1689993, 1690877), and the same two tests also fail intermittently on unrelated commits and other branches, independent of any code change in this PR (which only touches router fallback wrappers, the Anthropic Responses bridge, and unrelated UI/cost-map files). - tests.test_spend_logs.test_spend_logs: /spend/logs?request_id=... returns 500 even after a 20s wait for the spend log to be written. Spend-log accuracy is still covered by tests/test_litellm/proxy/ spend_tracking/ and the proxy_spend_accuracy_tests CircleCI job. - tests.test_team_members.test_add_multiple_members: /team/info?team_id= ... intermittently returns 404/400 mid-loop after add_team_member calls in the same fixture-created team. Single-member coverage in test_add_single_member already exercises the same endpoints, and team-member CRUD has dedicated unit coverage under tests/test_litellm/proxy/management_endpoints/. Skipping unblocks the build_and_test job until the underlying race in the dockerized integration setup is root-caused. * fix: preserve explicit timeout=0 in responses API handler Use 'timeout if timeout is not None else request_timeout' instead of 'timeout or request_timeout' so an explicit timeout=0/0.0 isn't silently replaced by the default request_timeout. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(ui): guard model_info access in pause Switch with optional chaining * fix(ui): guard model_info access in pause Switch onChange handler Mirror the optional-chaining guard already applied to the isPausing c… * fix(anthropic_messages): forward named params into MessagesInterceptor.handle (#27810) When ``anthropic_messages`` dispatches to a registered ``MessagesInterceptor`` (e.g. ``AdvisorOrchestrationHandler``), it currently splats only ``kwargs`` plus a handful of explicit positional/named args. Top-level parameters bound as named arguments on ``anthropic_messages`` — ``thinking``, ``metadata``, ``stop_sequences``, ``system``, ``temperature``, ``tool_choice``, ``top_k``, ``top_p`` — are silently dropped, because they live in local variables, not in ``kwargs``. This loses request fields on every interceptor sub-call. The most visible breakage: ``thinking={"type": "adaptive"}`` sent by clients (Claude Code, Anthropic SDK callers, etc.) is dropped on the executor sub-call, so downstream providers whose validation depends on ``thinking`` reject the request. Concretely, Vertex AI returns: invalid_request_error: ``clear_thinking_20251015`` strategy requires ``thinking`` to be enabled or adaptive even though the caller correctly sent ``thinking: {type: adaptive}``. Fix --- 1. Extend the existing ``request_kwargs.pop()`` extraction (already used for ``tools`` and ``stream``) to cover all named params we forward to the interceptor. This honors pre-request hook overrides for any of those fields and prevents duplicate-keyword conflicts when ``kwargs`` is splatted into ``interceptor.handle(...)``. 2. Forward every named parameter explicitly into ``interceptor.handle``, so the advisor (and any future interceptor) preserves the full request shape on its internal sub-calls. Tests ----- - ``test_named_params_forwarded_into_advisor_executor_subcall`` — drives the full ``anthropic_messages`` -> interceptor -> executor path and asserts all 8 named params arrive in the executor sub-call. Verified to fail on master (None vs caller-supplied values) and pass with this fix. - ``test_pre_request_hook_override_does_not_collide_with_explicit_kwargs`` — simulates a ``CustomLogger.async_pre_request_hook`` returning ``thinking``, ``system``, ``temperature``. Without the new pops, the explicit-kwarg forwarding raises ``TypeError: got multiple values for keyword argument``. This test locks in the pop extraction. All 5 tests in ``test_advisor_integration.py`` pass. * fix(guardrails): re-emit chunks in tool_permission streaming hook when no tool_calls found (#26585) * fix(guardrails): re-emit chunks in tool_permission streaming hook when no tool_calls found async_post_call_streaming_iterator_hook is an async generator. The `if not tool_calls:` branch (plain-text LLM replies) did a bare `return`, which terminates the generator without yielding anything. Clients received only `data: [DONE]` with empty content — the entire response was silently dropped. Fix: pass the assembled ModelResponse through MockResponseIterator and yield every chunk before returning, mirroring the allowed-tool code path that already exists a few lines below. Closes #26547 Re-submits after #26551 (auto-closed when litellm_oss_branch was deleted) * test(guardrails): strengthen plain-text streaming assertion to verify content fidelity Previously the regression test only checked that at least one chunk was yielded; now it also asserts that the chunk content matches the original assembled response, ensuring the fix preserves response data end-to-end. * Add dedicated xai_key and fallback logic for xAI API key (#28647) Add a provider-specific litellm.xai_key fallback for xAI chat, responses, and realtime requests. Keep the Responses API and realtime fallback order compatible by preserving litellm.api_key before XAI_API_KEY when no explicit provider-specific key is set. * fix(proxy): don't enforce budgets on model-discovery / info routes (#27923) (#29483) * fix(proxy): don't enforce budgets on model-discovery / info routes (#27923) * fix(proxy): narrow model-discovery budget bypass to explicit route set (#27923) * feat(search): add APISerpent (apiserpent.com) as search provider (#29448) * feat(search): add APISerpent (apiserpent.com) as search provider APISerpent is a multi-engine SERP API covering Google, Bing, Yahoo, and DuckDuckGo. It exposes two endpoints, quick search (/api/search/quick) and deep search (/api/search), both billed at $0.60 per 1k searches. Both are surfaced under a single `apiserpent` provider; callers select the deep endpoint with `deep=True`, following the way Linkup and Tavily ship two search setups under one provider. All supported parameters and their defaults live in a single APISerpentSearchParams dataclass, which enforces the documented bounds (num 1 to 100, pages 1 to 10) and types the constrained string params (engine, safe, freshness, format) as Literals. * address review: null results, idempotent api_base, test coverage Greptile fixes: coerce a null `results` payload to an empty list so error responses don't raise (P1); always apply the quick/deep path suffix so an api_base / APISERPENT_API_BASE host override still routes correctly, using an endswith guard to stay idempotent across the handler's double call into get_complete_url (P2); document why the deep-search num floor isn't enforced in the dataclass (P2). Move the test suite from tests/search_tests to tests/test_litellm/llms/apiserpent so the unit-test/coverage job (`pytest tests/test_litellm`) actually exercises it; the package now reports 100% patch coverage. Adds regression tests for the null-results and api_base-routing fixes. * register apiserpent in provider_endpoints_support.json The check_provider_folders_documented CI gate requires every litellm/llms folder to have an entry; add apiserpent with a search endpoint, mirroring the serper and tavily entries. * fix(github_copilot): handle missing choices in response for newer models (max_tokens=1 crash) (#29392) * fix(github_copilot): handle missing choices in response for newer models Newer Copilot backend models (claude-opus-4.7, 4.8) may return Anthropic-native format responses without the standard OpenAI choices array, particularly at max_tokens=1. This caused an unhandled IndexError. Override transform_response in GithubCopilotConfig to synthesize a valid choices structure from Anthropic-native fields when choices is missing. Fixes #29391 * fix black formatting * guard against missing choices in shared converter; delegate to super in provider override Three changes: 1. convert_dict_to_response.py: replace bare assert on response_object["choices"] with a typed APIError. Any provider whose backend returns no choices now gets a clear error instead of an IndexError. 2. transformation.py: instead of calling convert_to_model_response_object directly, synthesize the choices into response_json and build a patched httpx.Response, then delegate to super().transform_response(). This keeps us on the parent's post_call/header/logging path. 3. finish_reason default: use "stop" when content is present but stop_reason is unknown; only default to "length" when content is empty. * guard streaming response converters against missing choices Same defense-in-depth as the non-streaming path: raise a typed APIError instead of KeyError/empty iteration when choices is missing. * add unit tests for missing-choices guard in convert_dict_to_response Regression tests ensuring APIError is raised (not IndexError) when a provider returns a response without choices. Covers non-streaming, streaming cache-hit, and async streaming paths. * fix broken streaming tests: consume generators to actually exercise guards The stream=True test never consumed the returned generator, so the guard code never executed and pytest.raises saw no exception. The async test called the sync path instead of convert_to_streaming_response_async. Split into two tests that properly exercise both paths. * add unit tests for convert_dict_to_response and copilot transform_response Coverage for convert_dict_to_response.py: - _normalize_images_for_message (None, empty, adds index, preserves index) - _safe_convert_created_field (None, int, float, string, invalid string) - convert_to_streaming_response (None, happy path, finish_details fallback) - convert_to_streaming_response_async (None, happy path, tool_calls) - _handle_invalid_parallel_tool_calls (None, normal, multi_tool_use expansion, bad JSON) - _should_convert_tool_call_to_json_mode (all branches) - convert_tool_call_to_json_mode (converts, no-op) - convert_to_model_response_object embedding/transcription/rerank paths - completion path: tool_calls finish_reason override, multiple choices, json mode, reasoning_content, None inputs Coverage for github_copilot transformation.py line 197-198: - test_transform_response_invalid_json_falls_through_to_super --------- Co-authored-by: Rudy-Macmini <rudy-macmini@192.168.1.173> Co-authored-by: Rudy-Macmini <rudy-macmini@Rudy-Macminis-Mac-mini.local> * feat(proxy): add model_group filter to /spend/logs/v2 endpoint (#29405) Add an optional `model_group` query parameter to the `/spend/logs/v2` and `/spend/logs/ui` endpoints, allowing users to filter spend logs by model group. This is consistent with the existing `model` and `model_id` filters and requires no schema changes since `model_group` is already a column in the `LiteLLM_SpendLogs` table. Supersedes #24782 (rebased onto latest main). * fix(github_copilot): extract tool_calls from Anthropic-native Copilot responses Reuse AnthropicConfig.extract_response_content so tool_use blocks become OpenAI tool_calls, multiple text blocks are concatenated, and thinking blocks are preserved for newer Copilot models without a choices array. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(convert_dict_to_response): propagate missing-choices APIError; fix transcription token-usage test The defense-in-depth guard for missing 'choices' raised APIError inside the broad try/except in convert_to_model_response_object, which re-wrapped it as a generic Exception('Invalid response object ...'). Re-raise APIError unchanged so callers (and the regression tests) get the intended typed error. Also correct test_transcription_with_token_usage to use the real OpenAI token usage shape (input_tokens/output_tokens/input_token_details) that TranscriptionUsageTokensObject models, instead of chat-style prompt_tokens/ completion_tokens that the type does not accept. * test(convert_dict_to_response): exercise received_args debug path with malformed choice The missing-choices guard now raises a typed APIError for choices=None, so the old input no longer reaches the generic debugging handler. Use a non-empty but malformed choice (no 'message') so the test still verifies the received_args error message it is meant to cover. * fix(embedding): respect drop_params for unsupported dimensions parameter (#26868) --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: lengkejun <lengkejun@xd.com> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Ryan <ryan@Ryans-MBP.localdomain> Co-authored-by: Claude (greptile subagent) <claude-greptile-bot@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: TorvaldUtne <78661304+TorvaldUtne@users.noreply.github.com> Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com> Co-authored-by: cwang-otto <chengxuan.wang@ottotheagent.com> Co-authored-by: Roman Pushkin <roman.pushkin@gmail.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: boarder7395 <37314943+boarder7395@users.noreply.github.com> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com> Co-authored-by: Kevin Zhao <zkm8093@gmail.com> Co-authored-by: Matthew Lapointe <lapointe683@gmail.com> Co-authored-by: Elon Azoulay <elon.azoulay@gmail.com> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: afoninsky <andrey.afoninsky@gmail.com> Co-authored-by: Tai An <antai12232931@outlook.com> Co-authored-by: Joseph Barker <156112794+seph-barker@users.noreply.github.com> Co-authored-by: Maruti Agarwal <88403147+marutilai@users.noreply.github.com> Co-authored-by: Cursor Bugbot <bugbot@cursor.com> Co-authored-by: Greptile <greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Greptile Reviewer <greptile-apps@users.noreply.github.com> Co-authored-by: Dennis Henry <dennis.henry@okta.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: harish-berri <harish@berri.ai> Co-authored-by: Felipe Garé <90070734+FelipeRodriguesGare@users.noreply.github.com> Co-authored-by: withomasmicrosoft <withomas@microsoft.com> Co-authored-by: Aditya Singh <60082699+adityasingh2400@users.noreply.github.com> Co-authored-by: LiteLLM Bot <bot@berri.ai> Co-authored-by: Kenan Yildirim <kenan@kenany.me> Co-authored-by: vladpolevoi <vladp@lasso.security> Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: João Costa <13508071+jpv-costa@users.noreply.github.com> Co-authored-by: Michael-RZ-Berri <michael@berri.ai> Co-authored-by: Shivam Rawat <shivam@berri.ai> Co-authored-by: Vincent <yimao1231@gmail.com> Co-authored-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: d 🔹 <liusway405@gmail.com> Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com> Co-authored-by: Tom Denham <tom@tomdee.co.uk> Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com> Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com> Co-authored-by: robin-fiddler <robin@fiddler.ai> Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com> Co-authored-by: Felipe Rodrigues Gare Carnielli <felipe.gare@hotmail.com> Co-authored-by: Federico Kamelhar <federico.kamelhar@oracle.com> Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local> Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrishdholakia@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local> Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local> Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com> Co-authored-by: Shin <shin@litellm.ai> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MBP.localdomain> Co-authored-by: mateo-berri <mateo@berri.ai> Co-authored-by: Alex Yaroslavsky <trexinc@gmail.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Graham Neubig <398875+neubig@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Piotr Placzko <piotr@icep-design.com> Co-authored-by: Iana <iana@Shivakumars-MacBook-Pro.local> Co-authored-by: Samarth Maganahalli <samarth.maganahalli@gmail.com> Co-authored-by: Someswar <130047865+someswar177@users.noreply.github.com> Co-authored-by: Peter Dave Hello <3691490+PeterDaveHello@users.noreply.github.com> Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com> Co-authored-by: Daniel Yudelevich <4537920+yudelevi@users.noreply.github.com> Co-authored-by: rudy renjie meng <36201915+BeginnerRudy@users.noreply.github.com> Co-authored-by: Rudy-Macmini <rudy-macmini@192.168.1.173> Co-authored-by: Rudy-Macmini <rudy-macmini@Rudy-Macminis-Mac-mini.local> Co-authored-by: kejunleng <33445544+silencedoctor@users.noreply.github.com> Co-authored-by: Tim Ren <137012659+xr843@users.noreply.github.com>	2026-06-02 08:48:10 -07:00
yuneng-jiang	5e2d75d75d	bump deps (#29208 ) (#29226 ) * fix(deps): bump vulnerable proxy dependencies (starlette/fastapi, granian, pyarrow, semantic-router) Resolve known CVEs flagged by osv-scanner/grype against uv.lock. All bumped versions verified to resolve, install, and pass the proxy auth/route/middleware unit suites (717 tests) plus an import smoke on the new stack. - starlette 0.50.0 -> 1.1.0 (CVE-2026-48710 "BadHost", GHSA-86qp-5c8j-p5mr): versions <1.0.1 reconstruct request.url from the unvalidated Host header, poisoning request.url.path. Required raising fastapi 0.124.4 -> 0.136.3, which dropped fastapi's starlette<0.51.0 cap; an explicit starlette>=1.0.1 floor blocks regression to a vulnerable transitive resolution. The proxy's own auth already reads scope["path"] via get_request_route, but the locked starlette still flagged in container scanners and left other request.url consumers exposed. - granian 2.5.7 -> 2.7.4 (CVE-2026-42544, unauthenticated DoS via WebSocket subprotocol header panic; CVE-2026-42545, WSGI response-header-panic DoS). granian is a selectable proxy server (proxy_cli). - pyarrow 22.0.0 -> 23.0.1 (CVE-2026-25087 / PYSEC-2026-113). - semantic-router 0.1.12 -> 0.1.15: 0.1.12 was yanked (CVE-2026-42208 — its unbounded litellm pin could resolve a credential-exfiltrating litellm==1.82.8 wheel). Not fixable by bump: diskcache 5.6.3 (CVE-2025-69872, unsafe pickle deserialization) has no upstream fix and is left pinned; exploiting it requires write access to the local cache directory. Relock side effect: sse-starlette 3.4.2 -> 3.4.4. * deps: relax exact pins in optional extras to compatible ranges The proxy/optional extras exact-pinned every dependency, which (1) forces downstream `pip install litellm[proxy]` consumers into version lockstep and (2) blocks them from pulling transitive security patches without forking — the structural cause behind needing a litellm release to clear the starlette CVE in the previous commit. Convert the ordinary extras deps to `>=current,<next_major` ranges, mirroring the core [project].dependencies style. Reproducibility for litellm's own Docker/CI is unaffected: images install via `uv sync --frozen`, and the lock re-resolves to the identical versions (no locked version changed). Kept exact-pinned: - litellm-proxy-extras, litellm-enterprise — litellm's own sub-packages, versioned in lockstep with the release. - opentelemetry-api/sdk/exporter-otlp — must resolve to matching versions. - grpcio — supply-chain-pinned to a vetted, aged release. Also corrects the stale comment claiming the extras are exact-pinned for Docker reproducibility (the images use the lock, not these pins). * fix(ci): resolve license-check lookup version from the floor for ranged deps check_licenses.py derived the PyPI lookup version with `next(iter(req.specifier))`, which returns an arbitrary specifier clause. For a range like `>=0.12.1,<1.0` it picked the upper bound (`1.0`) — a version that doesn't exist on PyPI — so the license lookup 404'd and the package was flagged as having an unknown license. The previous commit's switch from exact pins to ranges exposed this for soundfile, pyroscope-io, redisvl, diskcache, and mlflow (the ranged deps not already in liccheck.ini's allowlist). Prefer a lower-bound/exact version (a real released version) for the lookup. * fix(proxy): set strict_content_type=False on the FastAPI app Starlette 1.0 / FastAPI 0.13x flipped the default to strict_content_type=True, which refuses to parse a JSON request body when the client omits the Content-Type header. The proxy previously accepted those requests, so the fastapi/starlette bump in this PR would silently break clients that don't send a Content-Type. Restore the prior lenient behavior explicitly. Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>	2026-05-28 16:48:14 -07:00
Mateo Wang	492891cad8	CI: copy of #25177 (OCI GenAI: embeddings, streaming/reasoning fixes, model catalog) (#28223 ) * fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455) Squash-merged by litellm-agent from Anai-Guo's PR. * feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508) Squash-merged by litellm-agent from yimao's PR. * fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550) Squash-merged by litellm-agent from krisxia0506's PR. * fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711) Squash-merged by litellm-agent from krisxia0506's PR. * fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503) Squash-merged by litellm-agent from krisxia0506's PR. * Fix Gemini MIME detection for extensionless GCS URIs (#27278) Squash-merged by litellm-agent from krisxia0506's PR. * fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes #28084) (#28107) Squash-merged by litellm-agent from voidborne-d's PR. * feat(chart): add support for autoscaling behavior in HPA (#27990) Squash-merged by litellm-agent from FabrizioCafolla's PR. * feat(proxy): add blocked flag to models for pause/resume from the UI (#27927) Squash-merged by litellm-agent from Cyberfilo's PR. * fix: pass socket timeouts to Redis cluster clients (#27920) Squash-merged by litellm-agent from tomdee's PR. * Fix/cache token (#28009) Squash-merged by litellm-agent from escon1004's PR. * fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080) Squash-merged by litellm-agent from Divyansh8321's PR. * fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617) * fix: reset org and tag budgets (#27326) * reset org budgets * reset tag budgets --------- Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> * fix(ui): omit allowed_routes from key edit save when unchanged (#27553) * fix(ui): omit allowed_routes from key edit save when unchanged When a team admin opens Edit Settings on a key with key_type=AI APIs and saves without changing anything, the UI re-sends the existing allowed_routes value, which the backend's _check_allowed_routes_caller_permission gate rejects for non-proxy-admins (LIT-2681). Strip allowed_routes from the patch in handleSubmit when it deep-equals the original keyData.allowed_routes. The backend treats absence as "leave alone," so no-op saves now succeed for non-admins. Admins explicitly editing the field still send the new value. * fix(ui): order-insensitive allowed_routes diff + cover null-original case Address Greptile review: - Switch the "is allowed_routes unchanged" check to a Set-based comparison so a server-side reorder of the array doesn't register as a user edit and re-trigger LIT-2681. - Add two regression tests: (1) keyData.allowed_routes is null and the form is untouched — patch should strip the field; (2) server returned routes in a different order than the user originally entered — patch should still recognize the value as unchanged. * chore(ui): strip ticket refs and tighten comments in key edit fix - Remove internal-tracker references from in-code comments - Tighten the WHY comment in handleSubmit to two lines - Drop redundant test-block comments — test names already describe the case * fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc * fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests GuardrailRaisedException and BlockedPiiEntityError both lacked a status_code attribute. When these exceptions reached the proxy exception handler (getattr(e, 'status_code', 500)), the fallback defaulted to HTTP 500 — making intentional guardrail blocks indistinguishable from server errors and causing unnecessary client retries. Changes: - Add status_code=400 (keyword-only) to GuardrailRaisedException - Add status_code=400 (keyword-only) to BlockedPiiEntityError - Update _is_guardrail_intervention() to recognize both exceptions so downstream loggers record 'guardrail_intervened' instead of 'guardrail_failed_to_respond' - Add 6 unit tests for default/custom status codes and getattr pattern - Strengthen existing blocked-action test with status_code assertion Fixes #24348 --------- Co-authored-by: Michael-RZ-Berri <michael@berri.ai> Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> * fix(router/proxy): address Greptile P1+P2 review comments on PR #28161 - router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429) when a specifically-addressed deployment is administratively blocked; 429 misleads retry-enabled clients into spinning forever against a paused model - proxy_server: compute get_fully_blocked_model_names() once before both branches in model_list() instead of duplicating the call in each branch - deepseek: upgrade silent debug log to warning when injecting placeholder reasoning_content so callers are clearly notified of degraded multi-turn quality - tests: update two blocked-deployment assertions to expect ServiceUnavailableError Co-authored-by: Cursor <cursoragent@cursor.com> * fix: address bug detection findings (cache token order, mutable defaults) Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: address bugs in async pass-through, anthropic cache token detection, rerank tests - async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments - cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0 - dashscope rerank tests: pass request to httpx.Response constructions for consistency Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix code qa * fix(vertex_ai/gemini): strip MIME parameters from GCS contentType GCS object metadata's contentType field can include parameters such as 'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases so downstream get_file_extension_from_mime_type sees a bare MIME type. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(vertex_ai/gemini): clarify mime-type error message string concatenation Co-authored-by: Yassin Kortam <yassin@berri.ai> * feat(oci): add embeddings, fix streaming/reasoning, expand model catalog - Add OCIEmbedConfig with full Cohere embed support (7 models, batch up to 96) - Fix sync streaming: split SSE events on \n\n before JSON parsing - Fix reasoning models (Gemini 2.5, xAI Grok): make completionTokens and message optional in OCIResponseChoice to handle max_tokens exhausted on reasoning - Fix compartment_id resolution in chat transform to use resolve_oci_credentials - Fix tool call id: make OCIToolCall.id optional, generate UUID fallback for providers (Google via OCI) that omit it - Add OCI_KEY env var support for inline PEM keys - Fix datetime.utcnow() deprecation in request signing - Expand model catalog: 29 OCI models including Llama 4, Gemini 2.5, xAI Grok, Cohere Command A, and all Cohere embed variants - Add 37 live integration tests: sync/async completions for Meta/Google/xAI/Cohere, sync/async embeddings, tool use across all vendors, streaming, env var auth - Add 23 embed unit tests covering all transform and validation paths * fix(oci): remove dead OCI elif branch in utils.py, align async split_chunks with sync version * test(oci): add unit tests for split_chunks fix and no-duplicate-OCI-branch guard * fix(oci): address remaining bugs from issue #25082 — streaming signed body, Cohere stop sequences, hardcoded defaults - Bug 1: sync and async streaming paths now use signed_json_body when provided instead of re-serializing data with json.dumps() — the OCI RSA-SHA256 signature covers the exact request body bytes, so re-serializing produces an invalid sig - Bug 3: Cohere stop sequences now map to 'stopSequences' (was incorrectly 'stop') - Bug 4: removed hardcoded Cohere defaults (maxTokens=600, temperature=1, topK=0, topP=0.75, frequencyPenalty=0) that silently overrode user intent on every call - Added 6 unit tests covering all three fixes * fix(oci): comprehensive code quality pass — bugs, tests, schema accuracy - Fix Cohere tool call IDs (was always call_0; now UUID per call) - Fix TOOL_CALL finish reason mapping in both sync and streaming paths - Fix Cohere stop parameter mapping (stop → stopSequences) - Remove hardcoded Cohere defaults (maxTokens/topK/topP/frequencyPenalty) - Fix content[0] safety guard against empty content arrays - Fix streaming signed body used consistently (not re-serialized) - Raise OCIError (not bare Exception/ValueError) throughout - Centralize OCI_API_VERSION constant; import uuid at module level - Fix embed get_complete_url to strip trailing slashes from api_base - Fix OCIEmbedResponse schema: add inputTextTokenCounts (actual OCI field) - Fix embed usage computed from inputTextTokenCounts (sum of per-input counts) - Fix Cohere toolCallId included in tool result messages - Add OCIToolCall.id as Optional (absent in Google/xAI streaming chunks) - Update tests to reflect correct behavior (no hardcoded defaults, UUID ids, deferred credential validation, OCIError vs ValueError, real response schema) * test(oci): move integration tests to tests/llm_translation/ Addresses greptile P1: tests/test_litellm/ is for mock-only unit tests (make test-unit target). Real-network OCI tests now live in the correct location alongside other provider integration tests. * fix(oci): align types and transformation with official OCI SDK - Remove OCIVendors.GEMINI — apiFormat="GEMINI" is invalid; all non-Cohere models use apiFormat="GENERIC" - Add toolChoice, logitBias, logProbs to OCIChatRequestPayload so params present in the mapping are no longer silently dropped by Pydantic - Exclude n→numGenerations from Cohere param map (not a Cohere API field) - Fix CohereToolResult: change callId/result to call/outputs matching the OCI SDK's CohereToolResult structure - Fix CohereToolMessage: replace non-existent toolCallId with toolResults list; update adapt_messages_to_cohere_standard to build proper tool-result history entries by resolving tool call name+params from preceding assistant messages - Map generic-model stream finish reasons to OpenAI convention (COMPLETE→stop, MAX_TOKENS→length, TOOL_CALLS→tool_calls), consistent with the existing Cohere streaming path - Add optional id field to OCIEmbedResponse so valid API responses carrying an id are not rejected by the Pydantic model * fix(oci): use 'output' key in Cohere tool result outputs (matches reference impl) * fix(oci): port schema/type utilities from langchain-oracle reference impl - Add resolve_oci_schema_refs: inline $ref/$defs — OCI rejects JSON Schema refs - Add resolve_oci_schema_anyof: flatten Optional[T] anyOf (Pydantic v2 emits these) - Add sanitize_oci_schema: strip title, normalise null types, ensure array items - Add OCI_JSON_TO_PYTHON_TYPES: Cohere expects Python type names (str/int/float), not JSON Schema names (string/integer/number) - Add enrich_cohere_param_description: embed enum/format/range/pattern constraints into description since CohereParameterDefinition has no dedicated fields - Apply all of the above in adapt_tool_definitions_to_cohere_standard and adapt_tool_definition_to_oci_standard - Fix toolChoice conversion: map OpenAI string ('auto','none','required') to OCI dict form ({"type":"AUTO"} etc.) — the API rejects plain strings - Update unit test expectations to match correct Python type names and enriched descriptions * refactor(oci): split transformation.py into cohere.py and generic.py transformation.py was 1 243 lines doing too many jobs. Split along the same boundaries as the langchain-oracle reference (providers/cohere.py, providers/generic.py): chat/cohere.py — Cohere message/tool building, response + stream parsing chat/generic.py — Generic message/tool building, response + stream parsing transformation.py — thin OCIChatConfig orchestrator + OCIStreamWrapper Public symbols (OCIChatConfig, OCIStreamWrapper, adapt_messages_to_, OCIRequestWrapper, version, …) remain importable from transformation.py for backward compatibility. OCIStreamWrapper gains delegating shims for _handle_cohere_stream_chunk and _handle_generic_stream_chunk so existing test call sites keep working unchanged. transformation.py: 1 243 → 620 lines refactor(oci): principal-level code quality pass - Remove _extract_text_content duplication — single definition in cohere.py, imported where needed; instance method on OCIChatConfig eliminated - Move cryptography imports to module level with _CRYPTOGRAPHY_AVAILABLE flag and _require_cryptography() guard; no more re-import on every signing call - Move litellm version import to module level via litellm._version; remove inline import inside validate_oci_environment - sign_with_manual_credentials now returns Tuple[dict, bytes] matching sign_with_oci_signer — asymmetry eliminated, Optional[bytes] guards removed throughout stream wrappers (signed_json_body: bytes = b"") - Rename _openai_to_oci_cohere_param_map → openai_to_oci_cohere_param_map for consistency with openai_to_oci_generic_param_map - Remove double-key bug in map_openai_params where responseFormat was stored under both OCI and OpenAI key names simultaneously - Remove delegating shims (adapt_messages_to_cohere_standard, adapt_tool_definitions_to_cohere_standard, _handle_generic_stream_chunk) from OCIChatConfig/OCIStreamWrapper; tests now import directly from cohere.py and generic.py where symbols live - Trim __all__ to 7 genuine public symbols; remove the 13-symbol list that existed only to support test imports - Collapse per-model integration test classes into pytest.mark.parametrize; CHAT_MODELS list is the single source of truth for model-specific config - Black + Ruff clean across all OCI files * fix(oci): address PR review findings - types/llms/oci.py: add "TOOL_CALL" to CohereChatResponse.finishReason Literal so Pydantic does not raise ValidationError on non-streaming Cohere tool-use calls (Greptile P1) - test_oci_cohere_tool_calls.py: add test covering TOOL_CALL finish reason - model_prices_and_context_window.json: remove 6 duplicate oci/cohere.embed-* keys that were silently overridden by the more complete entries already present in the file (Greptile P1) - common_utils.py: move OCI_API_VERSION here from chat/transformation.py so embed/transformation.py does not need to import chat/transformation; change Protocol stub body from ... to pass (CodeQL "statement no effect"); add comment to sha256_base64 clarifying it implements OCI HTTP signing spec, not password hashing (CodeQL false positive) - chat/transformation.py: import CustomStreamWrapper from litellm_core_utils.streaming_handler instead of litellm.utils to reduce import cycle depth (CodeQL cyclic import) - chat/cohere.py, chat/generic.py: import Usage and ChatCompletionMessageToolCall from litellm.types.utils instead of litellm.utils for the same reason - embed/transformation.py: import OCI_API_VERSION from common_utils instead of chat/transformation (removes the embed→chat import edge) * test(oci): add unit tests to improve patch coverage - test_oci_common_utils.py (new): covers sha256_base64, build_signature_string, OCIRequestWrapper.path_url, resolve_oci_credentials, get_oci_base_url, validate_oci_environment, sign_with_oci_signer error paths, sign_oci_request routing, load_private_key_from_file error paths, resolve_oci_schema_refs (including circular ref and external $ref), resolve_oci_schema_anyof, sanitize_oci_schema (all branches), enrich_cohere_param_description - test_oci_generic_chat.py (new): covers content-message error paths (non-dict item, unsupported type, non-string text, invalid image_url), tool-call validation error paths, adapt_messages_to_generic_oci_standard error paths, handle_generic_response (None message, text content, tool calls), handle_generic_stream_chunk (finish reasons, streaming tool calls), OCIStreamWrapper non-string chunk error - test_oci_chat_transformation.py: add error paths for validate_environment (empty messages), transform_request (missing compartment_id, Cohere without user messages), transform_response (error key), map_openai_params (unsupported param with and without drop_params), tool_choice string mapping - test_oci_cohere_tool_calls.py: add edge cases for stream chunk finish reasons (TOOL_CALL, MAX_TOKENS, unknown), _extract_text_content with non-dict list items and non-string input, adapt_messages_to_cohere_standard with malformed JSON tool arguments * fix(oci): rename supports_streaming to supports_native_streaming in model prices The JSON schema for model_prices_and_context_window.json uses `supports_native_streaming` (not `supports_streaming`) and has `additionalProperties: false`. Rename the field across all OCI entries to pass the schema validation test. * test(oci): add 67 tests targeting uncovered happy paths for coverage Boost patch coverage on the four lowest-coverage OCI files: - common_utils.py: sign_with_manual_credentials (oci_key / oci_key_file paths), sign_oci_request routing, _require_cryptography - generic.py: adapt_messages_to_generic_oci_standard (all roles), adapt_tool_definition_to_oci_standard, adapt_tools_to_openai_standard, handle_generic_stream_chunk text/finish-reason paths - cohere.py: _extract_text_content, adapt_messages_to_cohere_standard (all roles including tool results), handle_cohere_response / handle_cohere_stream_chunk all finish-reason branches - transformation.py: get_vendor_from_model, OCIChatConfig._get_optional_params (toolChoice string→dict, responseFormat, tools for both vendors), transform_request for GENERIC model, get_sync/async_custom_stream_wrapper with mocked HTTP, OCIStreamWrapper.chunk_creator happy paths * fix(oci): suppress CodeQL false positive on sha256_base64 (OCI HTTP signing, not password hashing) * fix(oci): remove 6 duplicate model price entries and reconcile conflicting values Six OCI chat model keys appeared twice in model_prices_and_context_window.json with conflicting pricing/context data (JSON parsers silently discard the first). Remove the first-occurrence entries and update the surviving entries: - meta.llama-4-maverick / llama-4-scout: keep updated entries (free preview pricing, larger context windows, vision support) - meta.llama-3.1-70b: keep original pricing, restore supports_native_streaming - google.gemini-2.5-{flash,pro,flash-lite}: keep OCI pricing page values, restore supports_native_streaming * fix(oci): route GPT-5 family to maxCompletionTokens GPT-5 / GPT-5-mini / GPT-5-nano / GPT-5.5 on OCI reject "maxTokens" with HTTP 400: Invalid 'maxTokens': Unsupported parameter: 'maxTokens' is not supported with this model. Use 'maxCompletionTokens' instead. (Same convention as OpenAI's reasoning-API contract.) Add a model-aware rename in OCIChatConfig._get_optional_params so the request payload uses maxCompletionTokens when the model id starts with openai.gpt-5. Regular Llama / Cohere / Gemini / GPT-4.x continue to use maxTokens unchanged. Also widen OCIChatRequestPayload to carry the new optional field so it survives Pydantic serialization. Verified live against OCI us-chicago-1: - openai.gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.5 all return 200 - Full feature sweep on gpt-5.5 (basic, system, multi-turn, streaming, tools, usage) all green - meta.llama-3.3-70b-instruct still uses maxTokens (no regression) 4 new unit tests cover the helper, the routing in both pre- and post-translation states, and Pydantic serialization. * ci(oci): fix CI failures — black formatting + recursive_detector ignore - Run black on litellm/llms/oci/common_utils.py + 3 OCI test files that drifted out of black-compliance during the rebase. - Add the three bounded recursive functions in oci/common_utils.py (`_resolve`, `resolve_oci_schema_anyof`, `sanitize_oci_schema`) to the recursive_detector IGNORE_FUNCTIONS list. All three are bounded: `_resolve` uses a `resolving_stack` cycle guard; the other two are bounded by JSON-schema tree depth (no cycles in well-formed input), matching the pattern of the existing OCI/Vertex schema walkers already on the list. * fix(oci): silence MyPy errors in cohere.py — typed-dict access Two errors flagged by `lint` CI: llms/oci/chat/cohere.py:73: "object" has no attribute "__iter__" llms/oci/chat/cohere.py:119: No overload variant of "get" of "dict" matches argument types "object", "CohereToolCall" Both stem from `msg.get("tool_calls")` / `msg.get("tool_call_id")` returning `object` per the AllMessageValues TypedDict union. Bind to `Any` locally for the iteration and coerce the lookup key with `str()`, removing the now-unused `# type: ignore` on those lines. No behaviour change — pure type-narrowing for the type checker. * fix(oci): silence CodeQL py/weak-sensitive-data-hashing on sha256_base64 CodeQL's taint analysis traces request bodies back to environment-loaded secrets and flags `hashlib.sha256(body).digest()` as `py/weak-sensitive-data-hashing` — even though SHA-256 is the algorithm mandated by the OCI HTTP request signing spec for the `x-content-sha256` header (not a password/secret hash). The previous suppression used legacy `# lgtm[...]` syntax which the modern CodeQL action ignores. Switch to Python's standard `hashlib.sha256(..., usedforsecurity=False)` (Python 3.9+) which CodeQL honours as a non-security declaration. Behaviour unchanged. * feat(oci): add reasoning_effort passthrough — only true missing primitive OCI's GenericChatRequest exposes a reasoningEffort field (NONE/MINIMAL/LOW/MEDIUM/HIGH) that's the single biggest cost knob for reasoning-capable models on the service: - GPT-5 family - Gemini 2.5 - Grok reasoning variants (3-mini, 4-fast, 4.20) - Cohere Command-A-Reasoning Setting reasoning_effort=LOW typically cuts reasoning-token spend 5-10× vs the default. Without exposing this, litellm users had no way to tune cost-vs-quality on these models. The other GenericChatRequest fields (verbosity, parallel_tool_calls, logit_bias, n, metadata, web_search_options, prediction) are not exposed because they are not missing primitives — they either duplicate prompt-engineering, framework-level controls, or are too niche to justify the maintenance surface. We only ship what users genuinely can't accomplish another way. Excluded from the Cohere v1 param map: CohereChatRequest has no reasoningEffort field, and Cohere reasoning models (cohere.command-a-reasoning) use COHEREV2 which is a separate request type not covered by this PR. Verified live: GPT-5.5 + reasoning_effort="HIGH" sends {"reasoningEffort": "HIGH"} on the wire and OCI accepts the request. * feat(oci): reasoning_effort + reasoning_tokens for OCI GenAI Three small additions for OCI reasoning models, requested by users testing the PR in production fork builds: 1. reasoning_effort param mapping (GENERIC vendors). OCI expects uppercase levels ("LOW"/"MEDIUM"/"HIGH"/"NONE") on `reasoningEffort`, but OpenAI-compatible clients send lowercase. Mapped + uppercased in `_get_optional_params`. Marked unsupported on Cohere V1/V2 since OCI Cohere has no reasoning models (avoids Pydantic validation failure on CohereChatRequest). 2. "disable" → "NONE" mapping. OpenAI uses "disable" to turn off reasoning; OCI uses "NONE". Without this, callers get a 400. 3. reasoning_tokens propagated to Usage. OCI returns `completionTokensDetails.reasoningTokens` but it wasn't being passed to LiteLLM's Usage object. Now flows through to `Usage.completion_tokens_details.reasoning_tokens` so callers can track reasoning token consumption for cost/observability. Tests: 7 new unit tests in TestOCIReasoningEffort covering upper/lower case, "disable"→"NONE", Cohere drop/raise paths, and reasoning_tokens extraction (with and without completionTokensDetails). 5 new live integration tests against xai.grok-3-mini in us-chicago-1 verifying the full request/response loop end-to-end. Existing test_transform_response_simple_text assertion that completion_tokens_details was None has been updated to assert reasoning_tokens flows through. Verified live on xai.grok-3-mini: reasoning_effort=low → OCI accepts "LOW", returns reasoningTokens=316 in usage. reasoning_effort=disable → OCI accepts "NONE". Full suite: 370/370 unit + 51/51 integration. * fix(codeql): re-scope py/weak-sensitive-data-hashing exclusion to OCI signing file CodeQL's taint analysis re-fires the `py/weak-sensitive-data-hashing` alert at `litellm/llms/oci/common_utils.py:103` whenever upstream code paths into the OCI signing module change (touching `transformation.py` opens new flow paths that CodeQL re-evaluates from scratch). The `hashlib.sha256(..., usedforsecurity=False)` declaration silences the direct-call form of the query but not the taint-flow form. SHA-256 here is mandated by the OCI HTTP signing specification for the x-content-sha256 content-integrity header — not for password storage: https://docs.oracle.com/en-us/iaas/Content/API/Concepts/signingrequests.htm CodeQL has no per-query path filter and GitHub Code Scanning ignores inline lgtm/codeql comments, so path-ignoring this single ~560-line signing utility file is the narrowest available suppression. All other files retain full coverage of py/weak-sensitive-data-hashing — including litellm/proxy/utils.py where the rule legitimately applies. This restores the NEUTRAL CodeQL state the PR had on prior commits (see `2111c98af7` for the same approach on the previous branch evolution that the cherry-pick was rebased onto a different baseline). * fix(oci): drop duplicate text on Cohere streaming terminal chunk OCI Cohere's terminal SSE event re-sends the full assembled response in `text` alongside a populated `chatHistory`. Emitting that text as another delta concatenates the entire response onto the already-streamed output (e.g. "How can I help?How can I help?"). Use `chatHistory is not None` as the discriminator for the consolidated terminal event — `finishReason` is a weaker signal that could in principle appear on a non-consolidated chunk. The two coincide today; this preserves correctness if OCI ever ships finishReason on an incremental chunk. Adds a live-OCI integration regression test that compares streamed vs non-streamed length and asserts the response prefix appears only once. Verified to fail under the previous code with the exact reported reproduction: 'Hello! How can I help you today?Hello! How can I help you today?'. Reported by @gotsysdba on PR #25177. * fix(oci): buffer SSE stream across HTTP read boundaries The old split_chunks helper split each individual HTTP read on "\n\n", which assumed SSE event boundaries always aligned with read boundaries. In practice the OCI streaming endpoint delivers events that may: - straddle two reads (chunk_creator gets a truncated JSON and crashes) - arrive separated by a single "\n" instead of "\n\n" - share a read with multiple complete events Replace the inline split with module-level helpers _iter_sse_events (sync) / _aiter_sse_events (async) that maintain a buffer across reads, split on any newline, and yield only complete "data:" lines. Add 25 regression tests covering event-split-across-reads, tiny-chunk reads, single-newline separators, keepalive/comment lines, trailing partial events flushed at EOF, "\r\n" line endings, and an end-to-end smoke test that feeds an awkwardly-chopped payload through the splitter into OCIStreamWrapper.chunk_creator. Reported by John Lathouwers. * test(oci): repoint TestOCIKeyNormalization to sign_with_manual_credentials The signing helper moved from OCIChatConfig._sign_with_manual_credentials to a module-level sign_with_manual_credentials in common_utils.py. Four tests in TestOCIKeyNormalization still called the old method: - 2 failed outright with AttributeError - 2 passed by accident because they used pytest.raises(Exception), which happily caught the AttributeError instead of exercising the intended OCIError path Repoint all four to the new module-level function so they exercise the actual oci_key type-validation branch. * fix(oci): validate oci_region before URL interpolation to prevent SSRF Anchor oci_region to ^[a-z][a-z0-9-]{0,30}[a-z0-9]$ inside get_oci_base_url so user-supplied regions that would redirect the signed request to an attacker-controlled host (e.g. 'evil.com/#') fail with HTTP 400 before the URL or signature is built. Empty string still falls back to the us-ashburn-1 default, so existing callers are unaffected. * test(audio): skip when gpt-4o-audio-preview is unavailable upstream OpenAI retired `gpt-4o-audio-preview` (404 model_not_found in CI as of 2026-05-19), and the existing try/except in these tests only re-raised on 'openai-internal' errors. Other exceptions were silently swallowed, so the next line ran with an unbound `response`/`completion` and failed with an unrelated UnboundLocalError that masked the real cause. Extend the skip condition to also cover model_not_found / 'does not exist' so the suite reports the upstream outage cleanly, matching the pattern used in `ce87c41` for the realtime and nvidia_nim rerank tests. Re-raise unknown exceptions instead of falling through. * fix(oci/router): catalog-driven maxCompletionTokens; generic blocked-deployment message - Drive OCI maxCompletionTokens via supports_reasoning from the model catalog instead of a hardcoded openai.gpt-5 prefix. Add OCI GPT-5 family entries (gpt-5, gpt-5-mini, gpt-5-nano) with supports_reasoning: true. Gate the override to non-Cohere vendor so Cohere reasoning models keep maxTokens (Cohere endpoint does not accept maxCompletionTokens). - Replace proxy-specific 'Contact your proxy admin' phrasing in the four Router blocked-deployment ServiceUnavailableError messages with neutral SDK-appropriate text. * fix(oci/cohere): guard handle_cohere_response against missing usage * fix(oci): address bug review findings in chat transformation - Cohere param map: keep tool_choice/n as False (not omitted) so unsupported params are dropped or rejected rather than silently passed through. - get_complete_url: when an explicit api_base/litellm.api_base is provided, use it as-is instead of unconditionally appending /20231130/actions/chat (mirrors the embed config behavior). - Cohere stream: require both chatHistory and finishReason to be present to identify a terminal consolidation chunk, avoiding silent text suppression if chatHistory ever appears on a non-terminal chunk. - Generic usage: use 'is not None' for reasoningTokens so a legitimate value of 0 is preserved instead of being treated as absent. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/cohere): emit tool calls in streaming and null content when text empty handle_cohere_response now sets message.content to None when the Cohere response text is empty, matching the OpenAI convention for tool-call-only responses. handle_cohere_stream_chunk now extracts toolCalls — both directly from the chunk and from the terminal chunk's chatHistory CHATBOT message — and emits them in the delta. Previously, CohereStreamChunk lacked a toolCalls field, so any tool calls in the stream were silently dropped. * fix(oci): preserve tool results, embed URL path, and generic finish reason - Use SerializeAsAny on CohereChatRequest.chatHistory so subclass-specific fields like CohereToolMessage.toolResults are not dropped during Pydantic v2 serialization. - Make OCIEmbedConfig.get_complete_url append the /20231130/actions/embedText action path consistently with chat, so setting litellm.api_base to the region inference base URL no longer posts to the bare hostname. - Map OCI finishReason (COMPLETE / MAX_TOKENS / TOOL_CALLS) to OpenAI finish_reason values in handle_generic_response, mirroring the streaming handler and the Cohere non-streaming handler. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/generic): silence mypy assignment error on dynamic finish_reason * fix(oci/embed): always set usage on embedding response Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/chat): append /20231130/actions/chat to explicit api_base Restore the embed-style behavior so OCIChatConfig.get_complete_url always appends the OCI GenAI chat path. Routing through get_oci_base_url ensures the optional explicit api_base has its trailing slash stripped before the suffix is joined, matching the embed config and the test_respects_explicit_api_base expectation. * fix(oci/cohere): mark logprobs/logit_bias unsupported and normalize unknown stream finish reasons Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/cohere): preserve trailing tool result in chatHistory When the last message in the OpenAI-format input is a tool result (the standard agentic continuation pattern), the prior messages[:-1] slice silently dropped that tool result from chatHistory and the model never saw it. Excluding the last user message by index instead keeps tool results that trail the last user turn intact. * fix(main): remove dead OCI embedding elif block The earlier elif at line 5119 already routes OCI embeddings through the base HTTP handler with the headers None-guard, so the later identical block was unreachable dead code. * test(oci): move integration tests out of llm_translation mock-only folder Greptile flags tests/llm_translation/ as mock-only via a project-specific rule; relocate the live-network OCI integration suite to tests/integration/ and adjust the in-file sys.path / run instructions accordingly. * fix(oci/cohere): suppress tool calls on stream terminal consolidation chunk The terminal SSE event re-sends the full assembled response in both `text` and `chatHistory`. The existing logic already suppresses `text` to avoid double-emit, but tool calls extracted from the terminal chunk (via `typed_chunk.toolCalls` or the `chatHistory` CHATBOT fallback) would still be re-emitted with fresh uuid4 IDs. If OCI Cohere ever streams tool calls progressively in intermediate chunks (now possible since CohereStreamChunk has a toolCalls field), this would cause downstream agentic frameworks to execute each tool call twice. Suppress tool calls on the terminal consolidation chunk for the same reason `text` is suppressed. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci,httpx): normalize finish_reason, preserve response_format, fix sync embed JSON content-type - cohere.py / generic.py: normalize unknown OCI finishReason values (ERROR, ERROR_TOXIC, CONTENT_FILTERED, USER_CANCEL, ...) to 'stop' in non-streaming and streaming generic handlers, matching the streaming Cohere handler so downstream consumers switching on finish_reason aren't broken by raw OCI values. - transformation.py: restore the dual-key alias so optional_params still carries the original 'response_format' key alongside the OCI-mapped 'responseFormat'. Downstream litellm framework code (json_mode detection, logging) inspects 'response_format' after map_openai_params runs. - llm_http_handler.py: make the sync embedding path mirror the async path — when sign_request returns no signed_body, send via json=data (which sets Content-Type: application/json) instead of data=json.dumps(data) which doesn't. Removes a sync/async behavioural asymmetry for non-OCI providers that adopt the sign_request pattern. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): clean up OCIChatConfig init, normalize generic stream finish reasons, correct embed sign_request return type - Replace fragile setattr(self.__class__, ...) pattern in OCIChatConfig.__init__ with a @property for has_custom_stream_wrapper, matching the pattern used by other providers. - Normalize unknown OCI finish reasons (e.g. ERROR, ERROR_TOXIC, USER_CANCEL) to 'stop' in handle_generic_stream_chunk, matching the existing Cohere stream handler behaviour. - Tighten OCIEmbedConfig.sign_request return type from Tuple[dict, Optional[bytes]] to Tuple[dict, bytes] — sign_oci_request never returns None for the body, and this matches OCIChatConfig.sign_request. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): strip trailing action path in get_oci_base_url to avoid URL doubling A fully-formed OCI endpoint URL (e.g. https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/chat) passed via api_base previously had the action path appended a second time by get_complete_url in both chat and embed configs, yielding a 404. get_oci_base_url now strips a trailing /20231130/actions/<name> so callers can always append the action path safely. * fix(httpx): preserve sync embed data= kwarg to avoid breaking mock-based tests The earlier sync_httpx_client.post() call passed data=json.dumps(data), which downstream embedding tests assert on (e.g. tests for hosted_vllm, jina_ai, watsonx). Switching to json=data changed the kwarg name and broke those tests. The OCI signed_body path keeps using data=signed_body and is unaffected. * fix(oci): stable tool-call ids across stream chunks; lenient Cohere finishReason - Replace random uuid4 per chunk with a deterministic content-derived digest for synthetic tool-call ids in both Cohere and Generic OCI handlers. Previously, when OCI omitted 'id' (always for Cohere, often for Generic streaming deltas), every chunk for the same logical tool call received a new uuid, causing downstream stream-mergers (which key off id) to treat each fragment as a distinct call. - Relax CohereChatResponse.finishReason from a strict Literal[...] to Optional[str], matching CohereStreamChunk.finishReason. The handle_cohere_response 'elif oci_finish_reason is not None' fallback was previously unreachable because Pydantic raised ValidationError on any unknown value before the fallback executed. Now non-streaming responses degrade unknown reasons to 'stop' just like the streaming path. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/embed): validate OCI credentials in validate_environment Mirror OCIChatConfig.validate_environment so embedding requests fail fast with a clear error when oci_user/oci_fingerprint/oci_tenancy/ oci_compartment_id or an oci_key/oci_key_file is missing, instead of deferring the failure until sign_request. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(oci/embed): expect OCIError from validate_environment when credentials are missing OCIEmbedConfig.validate_environment now raises eagerly (mirroring OCIChatConfig) when oci_user/oci_fingerprint/oci_tenancy/oci_compartment_id or oci_key/oci_key_file is missing. Update the test to match. * fix(oci): polish stream chunk handling and signed body default - cohere stream terminal consolidation now emits content=None instead of "" - drop redundant index truthiness check (None is already replaced with 0) - accept both "TOOL_CALL" and "TOOL_CALLS" finish reasons in cohere - signed_json_body defaults to None and uses explicit None check, so an explicitly empty bytes body wouldn't be silently re-serialized Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/chat): catch pydantic ValidationError when parsing OCI responses Pydantic v2 raises ValidationError (not TypeError) when field validation fails, so malformed OCI completion responses or stream chunks would propagate unhandled out of handle_generic_response, handle_generic_stream_chunk, and handle_cohere_stream_chunk. Widen the except clauses to also catch ValidationError so callers get a clean OCIError. * fix(oci/catalog): real prices for Llama 4, drop zero-cost OCI OpenAI entries Zero-cost catalog entries (input_cost_per_token=0, output_cost_per_token=0) make proxy spend tracking silently report $0 for these paid OCI models, so any caller can drive them without decrementing a budget. For Llama 4 Maverick and Scout, OCI charges the same character-based rate as Llama 3.3 70B ($0.0018 per 10,000 characters), so use the same per-token price as the existing oci/meta.llama-3.3-70b-instruct entry (7.2e-07 in/out). For oci/openai.gpt-5, gpt-5-mini, gpt-5-nano, gpt-oss-120b, and gpt-oss-20b, no public per-token pricing is available; drop the entries so operators must register them with explicit custom pricing. The existing GPT-5 reasoning test fixture already injects synthetic entries when the catalog omits them, so the chat transformation's supports_reasoning lookup keeps working in tests. * fix(oci/chat): wrap CohereChatResult construction in try/except Match the handle_generic_response pattern: surface OCIError with the upstream status code instead of letting a raw pydantic.ValidationError propagate when the Cohere response payload is malformed. * fix(oci): harden Cohere stream/finish-reason and dedupe maxTokens param mapping - Cohere stream: track per-stream tool-call emission and only suppress the terminal consolidation chunk's tool calls once they've been seen earlier. Prevents silent drop if tool calls are delivered exclusively on the terminal chunk. - Cohere stream: emit content=None (not "") on non-terminal text-free chunks (e.g. tool-call-only / keep-alive) so downstream consumers that distinguish missing vs explicitly-empty deltas behave correctly. - Generic handlers: accept singular TOOL_CALL finish reason in addition to TOOL_CALLS, matching the Cohere handlers. - _get_optional_params: when both max_tokens and max_completion_tokens are provided, explicitly prefer max_completion_tokens instead of relying on dict iteration order. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): emit content=None instead of empty string for text-free generic stream chunks Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(oci): expect content=None for text-free generic stream chunks handle_generic_stream_chunk now emits content=None instead of empty string when a chunk carries no text parts. Update the corresponding no-message test to match. * codeql: narrow OCI sha256 suppression to query-filter, not whole file paths-ignore was suppressing every CodeQL query on litellm/llms/oci/common_utils.py, hiding all future findings in a security-critical file (private key loading, credential resolution, URL construction, RSA signing). Move the suppression for py/weak-sensitive-data-hashing into query-filters so common_utils.py remains fully analyzed by every other query. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): use locale-independent RFC 7231 date for manual signing email.utils.formatdate(usegmt=True) emits canonical English weekday/ month abbreviations regardless of system locale, so signature verification doesn't break on non-en_US deployments. * fix(oci): strip 'oci/' prefix in get_vendor_from_model Previously, get_vendor_from_model split on '.' without stripping the optional 'oci/' provider prefix, so 'oci/cohere.command-a-03-2025' was routed through the GENERIC pipeline instead of COHERE. Co-authored-by: Yassin Kortam <yassin@berri.ai> * codeql: scope OCI sha256 suppression to common_utils.py via filter-sarif Replace the global query-filters exclude for py/weak-sensitive-data-hashing with a SARIF post-filter that only drops the alert when it originates from litellm/llms/oci/common_utils.py, keeping the rule active on every other SHA-256 callsite in the repository. * Fix OCI chat bugs: tool_calls None key, dead max_tokens dedup, single-event stream text suppression - handle_cohere_response: omit tool_calls key from message dict when None, matching the generic handler's behaviour and avoiding tripping consumers that key off 'tool_calls' in message. - _get_optional_params: remove dead prefer_max_completion branch. By the time this helper runs, map_openai_params has already collapsed max_tokens/max_completion_tokens onto the OCI alias, so the OpenAI-key membership check is unreachable. - handle_cohere_stream_chunk: add prior_text_emitted parameter mirroring prior_tool_calls_emitted. The terminal consolidation chunk's text is only suppressed when prior deltas already emitted text — otherwise (degenerate single-event stream) the text passes through so the response content isn't silently lost. OCIStreamWrapper now tracks emitted text alongside emitted tool calls. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): preserve all text parts in generic response and emit SYSTEM role for Cohere - handle_generic_response: iterate all content parts and concatenate text (matches the streaming handler) so non-leading text parts are not lost and a leading non-text part does not suppress trailing text. - adapt_messages_to_cohere_standard: emit CohereSystemMessage for system messages so direct callers do not silently drop them. The Cohere request builder filters system messages before calling this helper to avoid duplicating preambleOverride content into chatHistory. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): normalise dict-format tool_choice to OCI flat uppercase shape The OCI Generative AI API only accepts toolChoice values of the form {"type": "AUTO"\|"NONE"\|"REQUIRED"} or {"type": "FUNCTION", "name": "<fn>"}. The previous conversion only handled string tool_choice values, so OpenAI's standard dict shape {"type": "function", "function": {"name": "<fn>"}} passed through unchanged and was rejected by OCI with a 400. Normalise the dict shape by uppercasing the discriminator and hoisting the function name to the top level. Also accept dict variants of the non-function selectors (e.g. {"type": "auto"}). * test(oci): exercise system-message filtering at transform_request boundary adapt_messages_to_cohere_standard now emits SYSTEM-role entries by design so direct callers don't silently drop system content. The Cohere request builder filters system messages before calling the helper and routes them into preambleOverride, so the user-visible 'no SYSTEM in chatHistory' guarantee holds at the transform_request boundary, where the test should live. * fix(oci/chat): extract tool_choice/response_format helpers to satisfy PLR0915 _get_optional_params exceeded ruff's 50-statement cap. The toolChoice and responseFormat normalisation blocks are self-contained mutations, so move them to module-level helpers. * fix(oci): normalize None finishReason in generic non-streaming handler; drop dead Cohere system-role branch Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/generic): silence mypy assignment error on cleared finish_reason * fix(docker): install libatomic in builder for prisma nodeenv binary The prebuilt node binary that prisma-python's nodeenv downloads links against libatomic.so.1, which Wolfi does not pull in via gcc/nodejs. Without this, fresh Docker builds (no GHA cache hit) fail at `prisma generate` with: node: error while loading shared libraries: libatomic.so.1 * fix(oci): raise on invalid tool_choice instead of silently passing OpenAI shape _normalize_tool_choice previously left an OpenAI-format dict in selected_params['toolChoice'] when the type was unrecognized or when 'FUNCTION' was given with a missing/empty name. OCI would then reject the request with a non-obvious error. Raise ValueError with a clear message in these cases. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): raise OCIError instead of ValueError in _normalize_tool_choice Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/generic): declare non-security intent on sha256 for synthetic tool-call id * fix(oci): simplify _get_optional_params and reject invalid tool_choice types - Collapse the two-loop _get_optional_params into a single pass with clear precedence (OpenAI key wins over OCI alias; first OpenAI key reaching a given OCI target wins). Removes the redundant maxTokens special-case in the second loop and makes the map_openai_params / transform_request handoff easier to reason about. - Raise OCIError when _normalize_tool_choice sees an unexpected type (list, bool, int, ...) instead of silently letting it through to the OCI API where it would produce an opaque server-side error. Co-authored-by: Yassin Kortam <yassin@berri.ai> * Remove no-op data['stream'] deletion in OCI stream wrappers Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): always send Cohere isStream field explicitly Match OCIChatRequestPayload by defaulting CohereChatRequest.isStream to False instead of None so model_dump(exclude_none=True) does not silently omit the field on non-streaming requests. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): revert Cohere isStream to Optional[bool]=None to preserve omission semantics Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/generic): raise OCIError on empty choices instead of IndexError Pydantic accepts an empty choices list when validating OCICompletionResponse, so accessing chatResponse.choices[0] could raise an unhandled IndexError. Surface it as OCIError so the response error path is consistent with the existing (TypeError, ValidationError) guard. * fix(oci/cohere): map top_k -> topK so Cohere topK param is settable The Cohere param map (derived from the GENERIC map) had no entry for topK. Since the simplified _get_optional_params only iterates over param_map entries, callers had no way to pass topK to CohereChatRequest (neither via an OpenAI-style key nor via the OCI alias). Add 'top_k': 'topK' to the Cohere map only — OCIChatRequestPayload (GENERIC) has no topK field. _get_optional_params accepts both the OpenAI key (top_k) and the OCI alias (topK) in optional_params, so this covers both calling conventions. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): tighten cohere stream dedup flags and forward stream args in embed signing Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci/chat): reorder dict guard and wrap stream chunk json.loads - Move isinstance(response_json, dict) check before .get("error") so the guard runs before the attribute access it is supposed to protect. - Wrap json.loads in OCIStreamWrapper.chunk_creator with try/except so malformed SSE payloads surface as OCIError instead of a raw JSONDecodeError propagating out of the stream loop. * fix(oci/cohere stream): only flag text emitted on non-empty content An intermediate Cohere SSE chunk carrying text="" was flipping _cohere_text_emitted via the "is not None" check, which then caused the terminal consolidation chunk to drop its real text as a duplicate. Use a truthy check so only actual content marks the stream as having emitted text. * test(oci): end-to-end proxy integration test against real OCI GenAI Spins up the litellm proxy via the console-script entrypoint with a minimal OCI-only config and drives real OpenAI-shaped HTTP requests through it against OCI GenAI. Covers non-streaming chat, streaming chat, embeddings, and /v1/models for Cohere, Llama, Gemini, and Grok. Skips automatically when ~/.oci/config is absent or when the active profile uses session-token auth (the OCI provider currently only consumes OCI_* env vars; session tokens would need an in-process signer). API-key profiles work out of the box. * test(oci): move proxy integration test to tests/integration/ tests/llm_translation/ is mock-only; the OCI proxy integration test spawns a real proxy subprocess and makes live HTTP calls, so move it (and the companion config) to tests/integration/ alongside the existing test_oci_integration.py. * fix(oci): dedupe finish-reason mapping and batch Cohere tool results - Extract _normalize_oci_finish_reason helper so the four chat handlers (Cohere/GENERIC, sync/stream) share one OCI->OpenAI mapping instead of four near-identical if/elif chains. - Merge consecutive OpenAI tool-role messages into a single CohereToolMessage with multiple toolResults entries, matching the OCI Cohere API's expectation for parallel tool calls in one assistant turn. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(oci): drop dead Cohere toolChoice field and emit GENERIC tool-call dicts inline - Remove the unreachable toolChoice field from CohereChatRequest. The Cohere param map explicitly marks tool_choice as unsupported, so the field can never be populated through the normal optional_params flow and only confused the public model surface. - Build GENERIC stream tool-call dicts inline (id/type/function shape) instead of round-tripping through ChatCompletionMessageToolCall and model_dump(). Matches handle_cohere_stream_chunk so downstream stream-mergers see the same minimal payload regardless of which vendor produced the chunk. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(docker): drop redundant libatomic from non_root builder litellm_internal_staging already fixes the prisma `nodeenv` build failure at the root cause by restoring `npm` to the builder (#28519): with npm on PATH, prisma-python uses the system Node and never downloads the nodeenv binary that links against libatomic.so.1. After merging internal_staging the libatomic line is dead weight, so remove it. https://claude.ai/code/session_01SwKzxRxgUhLFyyEf4UV812 * fix(oci/catalog): add openai.gpt-5{,-mini,-nano} entries with supports_reasoning Without these catalog entries, supports_reasoning(model='openai.gpt-5*', custom_llm_provider='oci') returned False, so _model_uses_max_completion_tokens fell back to the default and OCI rejected the request with HTTP 400 ('Use maxCompletionTokens instead.'). Add the three entries so the catalog-driven maxCompletionTokens routing works against a stock LiteLLM install. Also reword the test fixture docstring — the bundled backup now actually ships these entries, so the fixture is only a fallback for environments that loaded their cost map from a stale remote source. --------- Co-authored-by: Tai An <antai12232931@outlook.com> Co-authored-by: Vincent <yimao1231@gmail.com> Co-authored-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: d 🔹 <liusway405@gmail.com> Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Tom Denham <tom@tomdee.co.uk> Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com> Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com> Co-authored-by: robin-fiddler <robin@fiddler.ai> Co-authored-by: Michael-RZ-Berri <michael@berri.ai> Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Federico Kamelhar <federico.kamelhar@oracle.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-05-23 12:15:41 -07:00
yuneng-jiang	985574b6be	fix(check_licenses): read PEP 639 license-expression metadata (#28529 ) The dependency license checker only read the legacy free-text `info.license` field from PyPI. Packages that adopt PEP 639 publish their license as an SPDX expression in `info.license_expression` and leave the legacy field null, so the checker reported "Unknown license" and failed CI for every newly-bumped PEP 639 dependency. `get_package_license_from_pypi` now resolves the license in order: `license_expression`, then legacy `license`, then the `License :: OSI Approved :: ...` trove classifiers. `is_license_acceptable` splits compound SPDX expressions on the uppercase OR/AND operators (case-sensitive, so the lowercase `-or-later` inside an identifier is not mistaken for an operator) and strips `WITH <exception>` suffixes, requiring every component to be acceptable. Free-text license blobs are detected and fall back to the original whole-string matching. The `black` and `pydantic-settings` entries in liccheck.ini that existed solely to work around this now resolve correctly on their own and have been removed.	2026-05-22 11:22:38 -07:00
yuneng-jiang	2a5dfcd5bc	build(deps-dev): bump black to 26.3.1 and apply formatting (#28525 ) * build(deps-dev): bump black 24.10.0 -> 26.3.1 * style: apply black 26.3.1 formatting * chore: authorize black 26.3.1 license in liccheck.ini	2026-05-21 17:24:18 -07:00
Yassin Kortam	014cb8fa9d	feat: add componentized proxy deployment with gateway, backend, ui, and migrations (#27557 ) Split the monolithic LiteLLM proxy into independently scalable Kubernetes components to allow separate horizontal scaling of the LLM data plane and management API surfaces - Add DatabaseURLSettings pydantic-settings model that assembles DATABASE_URL (and optional DATABASE_URL_READ_REPLICA) from discrete DATABASE_* env vars before Prisma initializes, supporting both IAM token auth (minting short-lived RDS tokens) and password auth; replaces the CLI-only path that componentized entrypoints bypass - Add gateway component (port 4000) that trims the proxy route table to the LLM data-plane surface (chat, embeddings, completions, audio, realtime, provider passthroughs, health/metrics) via an allowlist applied inside the lifespan context so plugin-registered routes are captured - Add backend component (port 4001) that exposes the management/admin surface (keys, users, teams, orgs, spend analytics, model management, SSO, audit logs) with a complementary allowlist - Add ui component — Next.js static export served by nginx (port 3000) with RSC payload routing, asset prefix aliasing, and SPA fallback for dashboard routes - Add migrations component with dedicated Dockerfile that runs prisma migrate deploy via a Helm pre-install/pre-upgrade Job, eliminating per-pod schema contention on the Prisma advisory lock - Add Helm chart (helm/litellm) with separate Deployments, Services, HPAs, and ConfigMap for each component; shared _helpers.tpl emits DATABASE_, IAM_TOKEN_DB_AUTH, REDIS_, and DISABLE_SCHEMA_UPDATE env vars from chart values; ingress template routes traffic to the correct component by path prefix - Add comprehensive tests for DatabaseURLSettings covering IAM auth, password auth, read replica fallbacks, operator-pinned URL preservation, and percent-encoding; add coverage test asserting gateway + backend allowlist union equals the full proxy route set - Add pydantic-settings>=2.14.1 as a proxy extra dependency and update liccheck allowlist Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-16 09:25:17 -07:00
Sameer Kankute	e912e6d4ff	feat(audio_transcription): add NVIDIA Riva STT provider (#27185 ) * feat(audio_transcription): add NVIDIA Riva STT provider Adds nvidia_riva as a new audio transcription provider, supporting both NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming. - Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy, audioread fallback) so callers can send any common format. - Maps OpenAI params: language (en -> en-US), response_format (text/json/ verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets, word offsets converted ms -> s for verbose_json. - Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted otherwise (SSL off by default), with explicit use_ssl override. - gRPC errors wrapped via NvidiaRivaException -> litellm exception classes. - Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client, soundfile, audioread, numpy). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(nvidia_riva): address PR review feedback - handler: forward call-level `timeout` to streaming_response_generator (kwarg-detected via inspect for older riva-client compat) so a stalled Riva server cannot block the caller indefinitely. - audio_utils: spill bytes to a tempfile before audioread.audio_open; most audioread backends (FFmpeg, GStreamer) require a real filesystem path and previously raised TypeError on BytesIO, breaking the mp3/m4a fallback path. - audio_utils: prefer soxr / scipy.signal.resample_poly for resampling (anti-aliased polyphase) when installed, falling back to linear only as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples. - transformation: bare `es` now maps to es-ES (Castilian) instead of es-US, matching BCP-47 conventions. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: trigger CI re-run [stabilize loop 1/3] * Update litellm/llms/nvidia_riva/audio_transcription/transformation.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * chore: trigger CI re-run [stabilize loop 1/3] * fix code qa * fix lint * fix mypy * fix mypy * Fix NVIDIA Riva ASR service lookup * Fix NVIDIA Riva transcription payload logging --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>	2026-05-05 17:17:51 -07:00
user	0c3b4a06cf	chore(deps): authorize pytest license	2026-05-04 11:39:46 -07:00
mateo-berri	722a1a9f8f	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_vcr-cassette-llm-tests-af37 # Conflicts: # litellm/llms/custom_httpx/llm_http_handler.py	2026-04-30 17:56:02 -07:00
Cursor Agent	0e880dc836	tests(llm_translation): add pytest-recording to license allowlist + greptile fixes CI's license check fails on the new dev dep because liccheck cannot read the PEP 639 'License-Expression' field that pytest-recording uses. Add the package to the manually-verified allowlist (MIT, confirmed via PyPI classifier). Also addresses greptile P2 review comments: - Add 'anthropic-version' to the request-header filter list so live and mock recordings produce structurally identical cassettes. - Replace the indentation-sensitive regex in '_strip_nondeterministic_headers' with a YAML parse-and-rewrite so the helper keeps working if vcrpy ever changes its serialization style. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-04-30 18:11:44 +00:00
user	4d92bc8b86	fix(vector-stores): re-raise HTTPException from get_vector_store_info; allowlist recursion Two issues from the previous push's review: 1. Greptile P1: ``get_vector_store_info`` had the same catch-all ``except Exception`` pattern as ``update_vector_store``, so the HTTPException(403/404) raised by both the in-memory access check and the new ``_fetch_and_authorize_vector_store`` helper was rewritten as 500. Mirror the ``except HTTPException: raise`` guard from ``update_vector_store``. 2. code-quality CI (``tests/code_coverage_tests/recursive_detector.py``) flagged ``_redact_sensitive_litellm_params`` as an unallowlisted recursive function. Match the convention of other allowlisted helpers ("max depth set"): bound recursion at depth 10 (well above any plausible nesting level for real ``litellm_params`` payloads), return the redaction sentinel on overflow, and add the function name to ``IGNORE_FUNCTIONS``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:56:55 +00:00
Yuneng Jiang	e7f4e77af0	Bound _get_masked_values recursion depth Add _depth/_max_depth guards (default 20) so the nested dict masking cannot run away, and allowlist the function in the recursive_detector CI check alongside the other bounded recursive helpers.	2026-04-24 13:00:33 -07:00
Ishaan Jaffer	e8461b5b97	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
Yuneng Jiang	b26f858ab0	fix(ci): authorize langgraph-prebuilt in liccheck.ini langgraph-prebuilt was previously pulled in as a transitive of langgraph so PyPI license metadata was reported as unknown. Now that it is explicitly pinned (==1.0.8) to avoid the broken 1.0.9 release, the license checker flags it. It is published under MIT by the same langchain-ai/langgraph repository as langgraph itself.	2026-04-16 09:41:51 -07:00
Yuneng Jiang	070374d03a	fix(ci): authorize RestrictedPython in liccheck.ini RestrictedPython (ZPL-2.1, a BSD-style permissive license) was added as a dependency for the custom_code guardrail sandbox, but the license checker didn't recognize it. Add to authorized packages list.	2026-04-15 21:20:40 -07:00
stuxf	a6c30b30bf	build: migrate packaging, CI, and Docker from Poetry to uv (#25007 ) * build: migrate packaging metadata to uv * ci: move automation and local tooling to uv * docker: migrate image builds and runtime setup to uv * docs: update install and deployment guidance for uv * chore: align auxiliary scripts and tests with uv * test: harden test_litellm isolation * fix: keep release and health check images self-contained * build: pin uv tooling and health check deps * test: isolate bedrock image request formatting from suite state * test: cover sandbox executor requirements flow * ci: fix circleci no-op command steps * ci: fix circleci publish workflow parsing * fix: stabilize remaining uv migration CI checks * ci: increase matrix test timeout headroom * fix: restore published docker and license coverage * fix: restore proxy runtime build parity * fix: restore proxy extras parity and venv migrations * ci: persist uv path across circleci steps * fix: keep psycopg binary in default test env * docker: preserve prisma cache across stages * test: run local proxy checks through uv python * build: restore runtime deps moved into ci * build: refresh uv lock after upstream merge * fix: restore module import in test_check_migration after merge The conflict resolution imported only the function but the test body references check_migration as a module throughout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching - Move google-generativeai, Pillow, tenacity back to ci group (they are lazily imported and bloat the base SDK install needlessly) - Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant in Docker where system Node.js is already installed via apk) - Remove all nodejs-wheel node replacement and venv npm patching blocks from Dockerfiles since the wheel is no longer installed - Add --no-default-groups to CodSpeed benchmark workflow so the benchmark environment matches the old minimal pip install footprint - Apply standard uv two-phase Docker pattern: copy metadata first, install deps (cached layer), then copy source and install project - Replace CircleCI enterprise no-op with proper uv sync command Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate uv.lock after removing nodejs-wheel-binaries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): use cache/restore instead of cache to prevent cache poisoning The old workflow used actions/cache/restore (read-only). The uv migration changed it to actions/cache (read-write), which zizmor flags as a cache poisoning risk. Restore the safer read-only variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert The setup-uv action enables caching by default, which zizmor flags as a cache poisoning risk. Disable it since we already use a read-only cache/restore step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): disable setup-uv cache in publish workflow Silences zizmor cache-poisoning alert. Publishing workflow runs infrequently on protected branches so caching adds no real benefit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(test): remove duplicate verbose_logger mock in test_check_migration The logger was patched twice — first via mocker.patch() then via mocker.patch.object(autospec=True). The second call fails because autospec cannot inspect an already-mocked attribute. Remove the redundant first patch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): free disk space before Docker build in test-server-root-path The Dockerfile.non_root build ran out of disk on the CI runner. Remove Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 11:46:23 -07:00
Yuneng Jiang	85f72c9d24	[Fix] Remove unused aioboto3 dependency and botocore conflict workarounds aioboto3 was listed as a dependency for async sagemaker calls but is not imported anywhere in the codebase — async calls use httpx + botocore SigV4 instead. Removing it eliminates the unresolvable botocore version conflict between boto3 and aiobotocore, along with all grep -v / --no-deps workarounds across Dockerfiles and CI. Also addresses Greptile review feedback: collapse redundant grpcio python-version markers, bump pyproject.toml cryptography to 46.0.5 to match Docker (GHSA-r6ph-v2qm-q3c2), and fix misleading .npmrc comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 14:25:44 -07:00
Yuneng Jiang	9c6d5f2b60	[Fix] Add aioitertools and wrapt to authorized licenses Both are transitive deps of aiobotocore, added to requirements.txt in the previous commit. aioitertools is MIT, wrapt is BSD. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 13:01:10 -07:00
Yuneng Jiang	7bd6fa8509	[Fix] Add hf-xet to authorized packages in license check hf-xet is Apache 2.0 licensed but PyPI metadata doesn't expose the license string, so the automated checker can't determine it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 09:51:29 -07:00
yuneng-jiang	a93c069dd5	[Fix] Add max_depth guard to BFL _read_image_bytes recursive function Use the standard depth/max_depth pattern with DEFAULT_MAX_RECURSE_DEPTH to guard the recursive list-unwrapping in _read_image_bytes, matching the existing pattern used by _read_all_bytes in vertex_imagen. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 13:22:56 -07:00
Ihsan Soydemir	b1a6ba7711	feat(search): add Serper (serper.dev) as search provider (#23112 ) * Add Serper (serper.dev) as a new search provider * Add @greptileai fixes	2026-03-09 08:40:37 -07:00
Sameer Kankute	b5183e9f3b	Merge pull request #22752 from BerriAI/litellm_search_api_add [Feat] Add Google Search API Integration	2026-03-04 18:29:10 +05:30
Sameer Kankute	0275e23601	Add routing for google search	2026-03-04 13:54:43 +05:30
Chesars	dc9f5a5cc4	fix(deps): update python-multipart to >=0.0.20 in CI and test configs	2026-03-03 15:10:39 -03:00
Chesars	dad7805b42	fix(deps): update python-multipart version to 0.0.22 in all files Align requirements.txt, CI workflow, liccheck, and license cache with the >=0.0.22 constraint already set in pyproject.toml.	2026-03-03 15:09:33 -03:00
yuneng-jiang	71c3503e57	Revert "[Feature] Add /public/supported_endpoints endpoint"	2026-02-26 17:21:43 -08:00
yuneng-jiang	efcc856234	Move provider_endpoints_support.json into litellm package The file was at the repo root and excluded from pip distributions. Moving it to litellm/proxy/public_endpoints/ alongside the other provider JSON files ensures it is packaged correctly. Updates all references in the endpoint handler, coverage tests, and release notes instructions. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-02-26 15:15:16 -08:00
Sameer Kankute	2d231c2f1a	Fix code qa	2026-02-26 12:08:40 +05:30
Ishaan Jaff	0a0768b3df	fix(ci): resolve mypy and check_code_and_doc_quality CI failures (#21812 ) - fix(mypy): suppress [misc] type error in common_utils.py for cls.__init__ access - fix(mypy): move type: ignore comment to correct line in test_eval.py (line 232 not 231) - fix(mypy): suppress [misc] and pre-existing pyright errors in vertex_ai_non_gemini.py - fix(check_licenses): strip inline comments before parsing requirements.txt lines so CVE comments don't break packaging.requirements.Requirement() - fix(router_coverage): add _merge_tools_from_deployment and _invalidate_access_groups_cache to ignored list (private helpers tested indirectly)	2026-02-21 13:08:47 -08:00
Sameer Kankute	5f70165a98	Fix get_unique_names_from_llms_dir	2026-02-18 18:32:25 +05:30
Ishaan Jaffer	add3183308	IGNORE_FUNCTIONS	2026-02-14 12:59:15 -08:00
Ishaan Jaffer	ad72d162cd	avector_store_create	2026-02-14 12:16:33 -08:00
yuneng-jiang	8d10311b4b	content filter test fix	2026-02-12 17:54:16 -08:00
Alexsander Hamir	ebce0e5f8c	[Release - 02/10/2026] v1.81.10-nightly	2026-02-10 16:26:30 -08:00
Krish Dholakia	10d891a365	Guardrails - add logging to all unified_guardrails + link to custom code guardrail templates (#20900 ) * feat(guardrail_hooks/): add guardrail logging to all unified guardrails ensures unified guardrails use the 'log_guardrail_information' decorator for logging * fix(custom_guardrail.py): don't log inputs on guardrail response - just emit state * refactor: don't double log bedrock guardrail information * feat: add in-product nudges for contributing + trying community custom code guardrails allows users to contribute / share custom code guardrails	2026-02-10 15:13:54 -08:00
Krish Dholakia	7056d9984e	Custom Code Guardrails UI Playground (#20377 ) * feat(guardrails/): allow custom code execution for guardrails first step in allowing teams to submit custom code for guardrails * feat: custom_code_guardrail.md support passing custom code for guardrails * feat: initial commit adding ui for custom code guardrails allows users to write guardrails based on custom code * feat: expose new test custom code guardrail endpoint allows ui testing playground to sanity check if guardrail is working as expected * fix: fix linting errors * fix: fix max recursion check * fix: fix linting error	2026-02-03 19:57:24 -08:00
Ishaan Jaff	9ed11c5cdf	[Feat] Allow calling A2A agents through LiteLLM /chat/completions API (#20358 ) * init A2AConfig * add transform files * feat: A2A * feat A2AConfig * fix get_secret_str * init: A2AConfig * init A2AConfig common utils * A2AConfig * test_a2a_completion_async_non_streaming * fix * Update litellm/main.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * add multi part conversation support * extract_text_from_a2a_message --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-02-03 12:52:33 -08:00
shin-bot-litellm	5bd5df3ca6	fix(test): add router.acancel_batch coverage (#20183 ) - Add test_router_acancel_batch.py with mock test for router.acancel_batch() - Add _acancel_batch to ignored list (internal helper tested via public API) Fixes CI failure in check_code_and_doc_quality job	2026-01-31 12:39:19 -08:00
Alexsander Hamir	69bd4426e8	[Release Day] - Fixed CI/CD issues & changed processes (#19902 )	2026-01-28 17:57:24 -08:00
Ishaan Jaffer	5135efb60e	fix pypdf: >=6.6.2	2026-01-28 14:54:58 -08:00
Alexsander Hamir	4a6dcf3012	Add test for Router.get_valid_args, fix router code coverage encoding (#19797 ) - Add test_get_valid_args in test_router_helper_utils.py to cover get_valid_args - Use encoding='utf-8' in router_code_coverage.py for cross-platform file reads	2026-01-26 10:14:58 -08:00
Ishaan Jaff	c23e4b87dc	[Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (#19612 ) * init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes	2026-01-22 19:49:53 -08:00
Sampson	09941dd1d1	add search provider for brave search api (#19433 ) * add search provider for brave search api Introduces a minimal implementation of the Brave Search API as a search provider. Additionally, this PR introduces a test file to ensure the provider works properly, and numerous other smaller changes (e.g., changes to docs to mention the new option). * Update transformation.py	2026-01-20 19:23:29 -08:00
Sameer Kankute	896d1a7dad	Fix Error: Found packages that need verification:	2026-01-19 18:18:24 +05:30
YutaSaito	7aba0f738a	Revert "Litellm staging 01 15 2026"	2026-01-17 06:31:34 +09:00
Sameer Kankute	84974d5745	Fix boto3 conflicting dependency	2026-01-16 16:55:12 +05:30
Sameer Kankute	f3ca05112e	Merge pull request #19206 from BerriAI/main merge main	2026-01-16 15:22:16 +05:30
Yuta Saito	9e1235c0aa	chore: add jaraco liccheck	2026-01-16 14:55:14 +09:00
burnerburnerburnerman	5676c6c135	Chore: bump boto3 version (#19090 )	2026-01-16 02:39:30 +05:30
Alexsander Hamir	15c3bc219b	[Refactor] Add CI enforcement for O(1) operations in _get_model_cost_key to prevent performance regressions (#19052 ) * Optimize _get_model_cost_key to avoid expensive scans - Remove expensive O(n) scan fallback that was causing 42.87% CPU overhead - Only scan when size mismatch detected (O(1) check) - Add warning in docstring: Only O(1) lookup operations are acceptable - Clean up comments to be more concise - Keep stale entry rebuild for pop() case (only triggers when stale entry found) This fixes the performance issue where the scan was being triggered on every failed lookup, causing severe CPU overhead during router operations. * Add code quality check to enforce O(1) operations in _get_model_cost_key - Add check_get_model_cost_key_performance.py to statically analyze _get_model_cost_key - Detects O(n) operations (loops, comprehensions, problematic function calls) - Recursively checks called functions to find nested O(n) operations - Allows conditional O(n) rebuilds in helper functions (_rebuild_model_cost_lowercase_map, _handle_stale_map_entry_rebuild, _handle_new_key_with_scan) * Integrate _get_model_cost_key performance check into CI pipeline - Add check_get_model_cost_key_performance.py to check_code_and_doc_quality job - Ensures O(1) requirement is enforced in CI to prevent performance regressions * Remove unused performance test and clean up utils.py - Remove test_get_model_info_performance.py (no longer needed) - Remove extra blank line in utils.py * Document allowed helper functions and exception process in _get_model_cost_key - Add documentation listing allowed helper functions with O(n) operations - Explain why these are acceptable (conditionally called) - Add instructions for adding new exceptions to check_get_model_cost_key_performance.py * Fix docstring detection and type checker error in performance check - Add proper docstring tracking to skip docstring content (fixes false positive for 'map' in docstring) - Add None check for docstring_quote to fix type checker error - Restore _handle_new_key_with_scan to allowed_helpers list * Remove check_get_model_cost_key_performance from CI pipeline - Temporarily remove the performance check from CI to avoid blocking builds * Restore performance check and remove memory leak tests from CI - Add back check_get_model_cost_key_performance.py to CI pipeline - Remove memory_leak_tests job that was causing port conflicts * Remove extra blank line in CI config	2026-01-13 17:08:03 -08:00

1 2 3

136 Commits