litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-18 07:33:58 +00:00

Author	SHA1	Message	Date
Alexsander Hamir	32fdb9e60e	fix: Add headers to Request scope in JWT tests to fix KeyError (#17927 ) - Add 'headers': [] to all Request(scope={'type': 'http'}) instances in test_jwt.py - Fixes KeyError: 'headers' when accessing request.headers in user_api_key_auth - All 7 previously failing tests now pass: - test_allow_access_by_email (2 variants) - test_allowed_routes_admin (4 variants) - test_team_token_output (2 variants) The Starlette Request object requires 'headers' key in scope dictionary when accessing request.headers property.	2025-12-13 10:36:13 -08:00
Alexsander Hamir	5b6b613561	[Fix] CI/CD - Fix failing proxy unit test and langfuse trace_id test (#17924 ) * fix: correct Request headers format in JWT auth test Fix test_jwt_non_admin_team_route_access by converting headers to bytes format as required by Starlette's ASGI specification. Headers must be bytes tuples with lowercase header names. This allows dict(request.headers) to work correctly and enables the authorization check to run, producing the expected error message. * fix: ignore UUID trace_id from standard_logging_object, use litellm_call_id The issue was that standard_logging_object.trace_id contains a UUID (from litellm_trace_id default), which was being used instead of falling back to litellm_call_id. This caused the test to fail because it expected 'my-unique-call-id' but got a UUID. Now we properly detect UUIDs (36 chars with 4 hyphens in specific positions) and ignore them, allowing the fallback to litellm_call_id to work correctly. This ensures we use litellm_call_id when no explicit trace_id is provided, which gets stored in the cache and returned by _get_trace_id(). * fix: use existing_trace_id when provided instead of litellm_call_id When existing_trace_id is provided in metadata, it should be used as the trace_id to return (and store in cache), not litellm_call_id. This fixes the test case where existing_trace_id is set and should be returned by _get_trace_id().	2025-12-13 09:32:43 -08:00
Alexsander Hamir	e9baa83a0f	[Fix] CI/CD – Clean Up Performance PR Changes & others (#17838 )	2025-12-11 12:50:03 -08:00
Hunter Wittenborn	82f0c3c887	Support model names with slashes on Gemini endpoints (#17743 ) * Support model names with slashes on Gemini endpoints * Fix test * Update tests/proxy_unit_tests/test_google_endpoint_routing.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/proxy_unit_tests/test_google_endpoint_routing.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/proxy_unit_tests/test_google_endpoint_routing.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/proxy_unit_tests/test_google_endpoint_routing.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/proxy_unit_tests/test_google_endpoint_routing.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/proxy_unit_tests/test_google_endpoint_routing.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-09 18:40:51 -08:00
Alexsander Hamir	958c190134	Fix flanky tests (#17665 ) * Fix test_delete_polling_removes_from_cache mock setup - Mock async_delete_cache to properly execute the real implementation path - Ensures init_async_client() is called and delete() is invoked on the returned client - Fixes AssertionError: Expected 'delete' to be called once. Called 0 times. * fix: resolve timeout in add_model_tab test by mocking useProviderFields hook - Mock useProviderFields hook to prevent network calls and React Query delays - Use waitFor to properly handle async operations - Test now passes reliably without 10s timeout * fix: add test timeout to prevent CI timeout failure - Add 15 second timeout to 'should display Test Connect and Add Model buttons' test - Test takes ~6 seconds locally, but CI was timing out at default 5 second limit - Ensures test has sufficient time to complete in CI environment * test: quarantine flaky test_oidc_circleci_with_azure Quarantine test that fails with 401 Unauthorized from Azure OAuth. The test is flaky and blocks CI builds. Marked with @pytest.mark.skip until Azure authentication can be fixed or migrated to our own account.	2025-12-08 12:21:26 -08:00
Sameer Kankute	b83bc10562	Merge pull request #16862 from xianzongxie-stripe/add_polling_via_cache_feature Add polling via cache feature	2025-12-08 08:41:25 +05:30
yuneng-jiang	d3d005f9bf	fixing tests	2025-12-06 21:23:49 -08:00
Ishaan Jaffer	eaa7e61f57	test fixes	2025-12-05 17:12:01 -08:00
Xianzong Xie	5d59f47db4	refactor: extract should_use_polling_for_request to polling_handler module Committed-By-Agent: cursor	2025-12-05 09:02:15 -08:00
Xianzong Xie	52d784b763	fix: correct mock setup for delete_polling test - Use Mock instead of AsyncMock for init_async_client (sync method) Committed-By-Agent: cursor	2025-12-04 18:00:28 -08:00
Xianzong Xie	03ee5c4489	test: add comprehensive tests for polling via cache feature - Add TestPollingConditionChecks: tests for all condition combinations - Add TestStreamingEventParsing: tests for OpenAI streaming event handling - Add TestEdgeCases: tests for empty model, multiple slashes, edge cases Total test count increased significantly for better coverage. Committed-By-Agent: cursor	2025-12-04 17:55:38 -08:00
Xianzong Xie	a8a38778a3	fix: resolve provider from router for polling_via_cache - Fix bug where model names without slash (e.g., 'gpt-5') couldn't match providers in polling_via_cache list - Look up model in llm_router.model_name_to_deployment_indices - Check ALL deployments for matching provider (supports load balancing) - Check custom_llm_provider first, then extract from model string - Add comprehensive tests for provider resolution logic Committed-By-Agent: cursor	2025-12-04 17:47:30 -08:00
Xianzong Xie	748bb6d5f5	test: add tests for all ResponsesAPIResponse fields - Add test_update_state_with_all_responses_api_fields to verify all fields - Add test_update_state_preserves_existing_fields to verify partial updates Committed-By-Agent: cursor	2025-12-04 14:15:06 -08:00
Xianzong Xie	1c3c12bb1b	refactor: move background_streaming_task to separate module - Create new background_streaming.py in response_polling/ - Update endpoints.py to import from new location - Update __init__.py to export background_streaming_task - Add tests for module imports and structure Committed-By-Agent: cursor	2025-12-03 22:50:26 -08:00
Xianzong Xie	540f14ef51	feat: improve polling via cache feature - Add 150ms batched updates instead of per-event updates for better performance - Handle response.output_text.delta events for text accumulation - Add response.in_progress event handling for status updates - Add response.completed event handling with reasoning, tools, tool_choice - Remove unused output_item parameter from update_state - Remove response.done event type (not valid in OpenAI spec) - Remove documentation files - Add comprehensive unit tests for ResponsePollingHandler Committed-By-Agent: cursor	2025-12-03 18:37:28 -08:00
yuneng-jiang	cc92fdf90f	Merge remote-tracking branch 'origin' into litellm_ui_callback_fix	2025-12-03 11:02:59 -08:00
Sameer Kankute	9edc50efbd	Fix 500 error for malformed request	2025-12-01 10:21:44 +05:30
yuneng-jiang	25e2331510	Merge remote-tracking branch 'origin' into litellm_ui_callback_fix	2025-11-27 17:29:29 -08:00
Ishaan Jaffer	85d4000af6	test_vertex_ai_partner_models_token_counting_endpoint	2025-11-26 11:37:55 -08:00
Carlo Alberto Ferraris	b50fcc4b56	vertex ai: use the correct domain for the global location when counting tokens (#17116 )	2025-11-25 19:22:20 -08:00
Sameer Kankute	67d69d12b0	Add cost tracking and logging support	2025-11-25 17:14:59 +05:30
yuneng-jiang	22fd323d6b	Calling team/permissions_list and team/permissions_update now returns 404 with non-existent team (#16835 )	2025-11-22 14:21:58 -08:00
Alexsander Hamir	ca2a27c377	fix: add missing mock attributes in websocket and realtime tests (#16974 ) - Add scope and url attributes to WebSocket mock in test_user_api_key_auth_websocket - Add shared_realtime_ssl_context initialization in realtime handler test	2025-11-22 10:44:23 -08:00
Alexsander Hamir	eb5031da1e	[Perf] Fix bottlenecks degrading realtime endpoint performance (#16670 ) * Cache realtime websocket request body Move the realtime request payload builder out of the websocket handler and wrap it with an LRU cache so repeated connections reuse the same bytes object. This keeps the JSON formatting cost down while bounding memory usage. * Optimize realtime websocket caching Refactored /v1/realtime to use cached helpers for both the JSON body and query params, introduced a reusable request-scope template, and optimized header handling to avoid redundant work. * Refine realtime websocket header handling * Reuse websocket scope headers in auth * Refactor realtime request body helper Move the realtime request body formatter into proxy common utils so it can be reused across modules. Reuse it in the websocket auth flow to share LRU caching and avoid ad hoc byte builders. * fix: revert to old pattern The old pattern was necessary, we can just return the optimized function instead. * Reuse SSL context for realtime Create a shared SSLContext for OpenAI realtime websocket dials and pass it into websockets.connect so we stop re-reading verify paths on every session. * feat: reuse shared TLS context for realtime websockets - add `SHARED_REALTIME_SSL_CONTEXT` helper so all realtime websocket clients share the same TLS settings - wire the shared context into OpenAI, Azure, custom HTTPX handlers, and realtime health checks - update realtime tests to assert that the expected SSL context is passed to `websockets.connect` This keeps TLS configuration consistent and avoids recreating SSL contexts per connection. * Reuse HTTP SSL context for realtime Remove the standalone realtime SSL helper, expose a shared context directly from the HTTP handler, and point all realtime websocket clients and tests to it. Add the websocket header comparison tool. * Lazy-load shared realtime SSL context Fix circular imports introduced by eagerly instantiating the shared TLS context. Make the HTTP handler lazily create the context and have realtime clients/tests fetch it on demand, keeping configuration consistent without breaking startup. * add: unit test for realtime LRU caches * fix: merge conflict with imports	2025-11-22 10:01:02 -08:00
yuneng-jiang	5dad3c9708	Merge remote-tracking branch 'origin' into litellm_ui_callback_fix	2025-11-21 16:27:49 -08:00
Ishaan Jaff	41566722af	[Feat] UI - Prompt Management - Allow testing prompts with Chat UI (#16898 ) * TestPromptRequest * add prompts/test endpoint for testing prompt * TestPromptTestEndpoint * feat: working v1 of this ui * workig prompt endpoints * add chat ui for prompts * add conversation panel * add init chat ui	2025-11-21 08:53:18 -08:00
yuneng-jiang	31535ff4b6	Merge with main	2025-11-20 20:24:03 -08:00
Sameer Kankute	34cc532d8d	Make sure that user inherits team permissions (#16639 )	2025-11-18 20:14:42 -08:00
Ishaan Jaff	06eeb28c8f	Litellm ci cd fixes 2 (#16693 ) * litellm_proxy_unit_testing_part1 * test proxy unit test * litellm_proxy_unit_testing_key_generation * test_async_call_with_key_over_model_budget * test_aasync_call_with_key_over_model_budget	2025-11-15 14:12:44 -08:00
Ishaan Jaffer	666913f76d	test_async_call_with_key_over_model_budget	2025-11-15 09:26:29 -08:00
yuneng-jiang	cff8a3115a	Merge with main	2025-11-14 20:02:35 -08:00
Ishaan Jaffer	63994e302e	test_call_with_key_over_model_budget	2025-11-14 19:05:00 -08:00
Ishaan Jaffer	9e8653ad3c	fix prisma client	2025-11-14 18:25:27 -08:00
Alexsander Hamir	c7847125c2	[Perf] Embeddings: Use router's O(1) lookup and shared sessions (#16344 ) * Refactor proxy embeddings to use shared processor - allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks - route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses - tighten token array decoding logic by using router deployment lookups and the unified error handler * Fix: Correctly process embedding requests with token arrays The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected. This was caused by a combination of three distinct issues: 1. In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup. 2. In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers. 3. In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists. Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions. * test: align proxy embedding assertions Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload. * Update proxy exception test The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list. * testing: unsure of this change I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it. * fix: remove unrelated change This change was not related to the embeddings refactor and actually belonged to a different branch.	2025-11-14 09:21:45 -08:00
Ishaan Jaffer	ee8b1cfabc	test_call_with_end_user_over_budget	2025-11-13 16:26:02 -08:00
yuneng-jiang	cb27d6c456	[Fix] UI - Delete Callbacks Failing (#16473 ) * Temp commit for branch switching * Created normalize callback name util function and tests	2025-11-12 18:43:37 -08:00
Ishaan Jaff	5c9f50d584	[AI Gateway] - End User Budgets - Allow pointing max_end_user budget to an id, so the default ID applies to all end users (#16456 ) * add _apply_budget_limits_to_end_user_params * add _apply_budget_limits_to_end_user_params * add _apply_budget_limits_to_end_user_params * test_default_budget_applied_to_end_user_without_budget * docs fix * fix config	2025-11-11 08:20:13 -08:00
yuneng-jiang	7833b3fdb4	Addressing comments	2025-11-10 17:28:13 -08:00
yuneng-jiang	5853dbafc8	Merge branch 'main' into litellm_ui_callback_fix	2025-11-10 16:50:58 -08:00
Cesar Garcia	16325024df	fix: Use valid CallTypes enum value in embeddings endpoint (#16328 ) * Fix embeddings endpoint call_type to use valid CallTypes enum value Fixed bug where the `/embeddings` endpoint was passing `call_type="embeddings"` to guardrail hooks, but "embeddings" is not a valid value in the CallTypes enum. Changed to use `call_type="aembedding"` (async embedding) which is the correct CallTypes enum value and matches the route_type used in the same function. Added unit tests to verify: - "embeddings" is not a valid CallTypes enum value - "aembedding" is the correct valid value - The fix prevents ValueError when guardrails are enabled Fixes #16240 * Inline embeddings call type regression check * Ensure embedding test preserves proxy metadata	2025-11-06 19:25:00 -08:00
Ishaan Jaffer	b5d81a5d9c	test_completion_text_003_prompt_array, test_key_generate_with_secret_manager_call	2025-11-06 17:18:01 -08:00
yuneng-jiang	2d5ae35a85	Show all callbacks on UI	2025-11-06 12:38:47 -08:00
Petre Alexandru	911e802969	feat: add parallel execution handling in during_call_hook (#16279 )	2025-11-05 18:35:25 -08:00
yuneng-jiang	5d158775b1	[Fix] Litellm non root docker Model Hub Table fix (#16282 ) * Fix model hub table 404 on non-root docker * Adding test	2025-11-05 18:30:20 -08:00
Sameer Kankute	c45fad3855	Fix: Send Gemini API key via x-goog-api-key header with custom api_base (#16085 ) * Add gemini api key in the custom api url * Update tests * Use api key n the header * Use api key n the header * fix mypy error * fix mypy error * fix test gemini auth	2025-11-05 07:12:13 -08:00
Bowen Liang	4e12e3f90d	fix typo of orginal (#16255 )	2025-11-04 18:55:44 -08:00
steve-gore-snapdocs	88240c4cba	Fix Anthropic token counting for VertexAI (#16171 ) * transform anthropic messages in gemini handler * initial * linting * remove extra testt * maintain consistency * more tests * Revert "transform anthropic messages in gemini handler" This reverts commit 805e60fd2887991bb4b4554b9394437b874835f9. * don't lint file we aren't changing * cleanup * cleanup * Cleanup	2025-11-02 09:02:07 -08:00
Ishaan Jaffer	cb57455172	test_foward_litellm_user_info_to_backend_llm_call	2025-10-27 13:48:23 -07:00
Krish Dholakia	2bd41dc034	Guardrails - Responses API, Image Gen, Text completions, Audio transcriptions, Audio Speech, Rerank, Anthropic Messages API support via the unified `apply_guardrails` function (#15706 ) * fix(presidio.py): handle content as a list of texts covers openai + anthropic messages api * fix(presidio.py): safe get messages * test: add unit testing for presidio guardrails * fix(unified_guardrail.py): initial commit * fix(enkryptai.py): implement apply_guardrail to enkrypt guardrail * fix(unified_guardrail.py): support unified guardrail on input * feat(unified_guardrail.py): add post call success hook implementation allows us to just have 1 place to handle llm translation to guardrail api spec * refactor: refactor initial unified guardrail component * refactor: more refactoring * feat(responses/): add guardrails to responses api allows existing guardrails to work for new llm endpoints * docs(adding_guardrail_support.md): document new guardrail endpoint support * test: add unit tests * feat(image_generation/): add guardrail support for image generation endpoint * feat(openai/text_completion): support guardrails on `/v1/completions` API * docs: document guardrails support on new endpoints * docs: clarify when guardrails run * feat(openai/speech): add guardrail support for input * docs(rerank/): add guardrail support on input query * fix: fix ruff check	2025-10-25 13:38:57 -07:00
Ishaan Jaffer	0bedf1c0a7	fix tests	2025-10-25 10:19:24 -07:00

1 2 3 4 5 ...

273 Commits