Commit Graph

250 Commits

Author SHA1 Message Date
Sameer Kankute 9edc50efbd Fix 500 error for malformed request 2025-12-01 10:21:44 +05:30
Ishaan Jaffer 85d4000af6 test_vertex_ai_partner_models_token_counting_endpoint 2025-11-26 11:37:55 -08:00
Carlo Alberto Ferraris b50fcc4b56 vertex ai: use the correct domain for the global location when counting tokens (#17116) 2025-11-25 19:22:20 -08:00
Sameer Kankute 67d69d12b0 Add cost tracking and logging support 2025-11-25 17:14:59 +05:30
yuneng-jiang 22fd323d6b Calling team/permissions_list and team/permissions_update now returns 404 with non-existent team (#16835) 2025-11-22 14:21:58 -08:00
Alexsander Hamir ca2a27c377 fix: add missing mock attributes in websocket and realtime tests (#16974)
- Add scope and url attributes to WebSocket mock in test_user_api_key_auth_websocket
- Add shared_realtime_ssl_context initialization in realtime handler test
2025-11-22 10:44:23 -08:00
Alexsander Hamir eb5031da1e [Perf] Fix bottlenecks degrading realtime endpoint performance (#16670)
* Cache realtime websocket request body

Move the realtime request payload builder out of the websocket handler and wrap it with an LRU cache so repeated connections reuse the same bytes object. This keeps the JSON formatting cost down while bounding memory usage.

* Optimize realtime websocket caching

Refactored /v1/realtime to use cached helpers for both the JSON body and query params, introduced a reusable request-scope template, and optimized header handling to avoid redundant work.

* Refine realtime websocket header handling

* Reuse websocket scope headers in auth

* Refactor realtime request body helper

Move the realtime request body formatter into proxy common utils so it can be reused across modules. Reuse it in the websocket auth flow to share LRU caching and avoid ad hoc byte builders.

* fix: revert to old pattern

The old pattern was necessary, we can just return the optimized function instead.

* Reuse SSL context for realtime

Create a shared SSLContext for OpenAI realtime websocket dials and pass it into websockets.connect so we stop re-reading verify paths on every session.

* feat: reuse shared TLS context for realtime websockets

- add `SHARED_REALTIME_SSL_CONTEXT` helper so all realtime websocket clients share the same TLS settings
- wire the shared context into OpenAI, Azure, custom HTTPX handlers, and realtime health checks
- update realtime tests to assert that the expected SSL context is passed to `websockets.connect`

This keeps TLS configuration consistent and avoids recreating SSL contexts per connection.

* Reuse HTTP SSL context for realtime

Remove the standalone realtime SSL helper, expose a shared context directly from the HTTP handler, and point all realtime websocket clients and tests to it. Add the websocket header comparison tool.

* Lazy-load shared realtime SSL context

Fix circular imports introduced by eagerly instantiating the shared TLS context. Make the HTTP handler lazily create the context and have realtime clients/tests fetch it on demand, keeping configuration consistent without breaking startup.

* add: unit test for realtime LRU caches

* fix: merge conflict with imports
2025-11-22 10:01:02 -08:00
Ishaan Jaff 41566722af [Feat] UI - Prompt Management - Allow testing prompts with Chat UI (#16898)
* TestPromptRequest

* add prompts/test endpoint for testing prompt

* TestPromptTestEndpoint

* feat: working v1 of this ui

* workig prompt endpoints

* add chat ui for prompts

* add conversation panel

* add init chat ui
2025-11-21 08:53:18 -08:00
Sameer Kankute 34cc532d8d Make sure that user inherits team permissions (#16639) 2025-11-18 20:14:42 -08:00
Ishaan Jaff 06eeb28c8f Litellm ci cd fixes 2 (#16693)
* litellm_proxy_unit_testing_part1

* test proxy unit test

* litellm_proxy_unit_testing_key_generation

* test_async_call_with_key_over_model_budget

* test_aasync_call_with_key_over_model_budget
2025-11-15 14:12:44 -08:00
Ishaan Jaffer 666913f76d test_async_call_with_key_over_model_budget 2025-11-15 09:26:29 -08:00
Ishaan Jaffer 63994e302e test_call_with_key_over_model_budget 2025-11-14 19:05:00 -08:00
Ishaan Jaffer 9e8653ad3c fix prisma client 2025-11-14 18:25:27 -08:00
Alexsander Hamir c7847125c2 [Perf] Embeddings: Use router's O(1) lookup and shared sessions (#16344)
* Refactor proxy embeddings to use shared processor

- allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks

- route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses

- tighten token array decoding logic by using router deployment lookups and the unified error handler

* Fix: Correctly process embedding requests with token arrays

The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected.

This was caused by a combination of three distinct issues:

1.  In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup.

2.  In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers.

3.  In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists.

Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions.

* test: align proxy embedding assertions

Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload.

* Update proxy exception test

The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.

* testing: unsure of this change

I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.

* fix: remove unrelated change

This change was not related to the embeddings refactor and actually belonged to a different branch.
2025-11-14 09:21:45 -08:00
Ishaan Jaffer ee8b1cfabc test_call_with_end_user_over_budget 2025-11-13 16:26:02 -08:00
yuneng-jiang cb27d6c456 [Fix] UI - Delete Callbacks Failing (#16473)
* Temp commit for branch switching

* Created normalize callback name util function and tests
2025-11-12 18:43:37 -08:00
Ishaan Jaff 5c9f50d584 [AI Gateway] - End User Budgets - Allow pointing max_end_user budget to an id, so the default ID applies to all end users (#16456)
* add _apply_budget_limits_to_end_user_params

* add _apply_budget_limits_to_end_user_params

* add _apply_budget_limits_to_end_user_params

* test_default_budget_applied_to_end_user_without_budget

* docs fix

* fix config
2025-11-11 08:20:13 -08:00
Cesar Garcia 16325024df fix: Use valid CallTypes enum value in embeddings endpoint (#16328)
* Fix embeddings endpoint call_type to use valid CallTypes enum value

Fixed bug where the `/embeddings` endpoint was passing `call_type="embeddings"`
to guardrail hooks, but "embeddings" is not a valid value in the CallTypes enum.

Changed to use `call_type="aembedding"` (async embedding) which is the correct
CallTypes enum value and matches the route_type used in the same function.

Added unit tests to verify:
- "embeddings" is not a valid CallTypes enum value
- "aembedding" is the correct valid value
- The fix prevents ValueError when guardrails are enabled

Fixes #16240

* Inline embeddings call type regression check

* Ensure embedding test preserves proxy metadata
2025-11-06 19:25:00 -08:00
Ishaan Jaffer b5d81a5d9c test_completion_text_003_prompt_array, test_key_generate_with_secret_manager_call 2025-11-06 17:18:01 -08:00
Petre Alexandru 911e802969 feat: add parallel execution handling in during_call_hook (#16279) 2025-11-05 18:35:25 -08:00
yuneng-jiang 5d158775b1 [Fix] Litellm non root docker Model Hub Table fix (#16282)
* Fix model hub table 404 on non-root docker

* Adding test
2025-11-05 18:30:20 -08:00
Sameer Kankute c45fad3855 Fix: Send Gemini API key via x-goog-api-key header with custom api_base (#16085)
* Add gemini api key in the custom api url

* Update tests

* Use api key n the header

* Use api key n the header

* fix mypy error

* fix mypy error

* fix test gemini auth
2025-11-05 07:12:13 -08:00
Bowen Liang 4e12e3f90d fix typo of orginal (#16255) 2025-11-04 18:55:44 -08:00
steve-gore-snapdocs 88240c4cba Fix Anthropic token counting for VertexAI (#16171)
* transform anthropic messages in gemini handler

* initial

* linting

* remove extra testt

* maintain consistency

* more tests

* Revert "transform anthropic messages in gemini handler"

This reverts commit 805e60fd2887991bb4b4554b9394437b874835f9.

* don't lint file we aren't changing

* cleanup

* cleanup

* Cleanup
2025-11-02 09:02:07 -08:00
Ishaan Jaffer cb57455172 test_foward_litellm_user_info_to_backend_llm_call 2025-10-27 13:48:23 -07:00
Krish Dholakia 2bd41dc034 Guardrails - Responses API, Image Gen, Text completions, Audio transcriptions, Audio Speech, Rerank, Anthropic Messages API support via the unified apply_guardrails function (#15706)
* fix(presidio.py): handle content as a list of texts

covers openai + anthropic messages api

* fix(presidio.py): safe get messages

* test: add unit testing for presidio guardrails

* fix(unified_guardrail.py): initial commit

* fix(enkryptai.py): implement apply_guardrail to enkrypt guardrail

* fix(unified_guardrail.py): support unified guardrail on input

* feat(unified_guardrail.py): add post call success hook implementation

allows us to just have 1 place to handle llm translation to guardrail api spec

* refactor: refactor initial unified guardrail component

* refactor: more refactoring

* feat(responses/): add guardrails to responses api

allows existing guardrails to work for new llm endpoints

* docs(adding_guardrail_support.md): document new guardrail endpoint support

* test: add unit tests

* feat(image_generation/): add guardrail support for image generation endpoint

* feat(openai/text_completion): support guardrails on `/v1/completions` API

* docs: document guardrails support on new endpoints

* docs: clarify when guardrails run

* feat(openai/speech): add guardrail support for input

* docs(rerank/): add guardrail support on input query

* fix: fix ruff check
2025-10-25 13:38:57 -07:00
Ishaan Jaffer 0bedf1c0a7 fix tests 2025-10-25 10:19:24 -07:00
Carlo Alberto Ferraris 8b1424166b attempt to avoid/minimize deadlocks (#15281)
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
2025-10-24 12:22:38 -07:00
Ishaan Jaff f55745fc5e [Fix] Forward anthropic-beta headers to Bedrock, VertexAI (#15700)
* [Fix] Forward anthropic-beta headers to Bedrock and other cross-provider scenarios (#15623)

* add_provider_specific_headers_to_request

* fix add_provider_specific_headers_to_request

* test_provider_specific_header_multi_provider

* test_provider_specific_header_in_request

---------

Co-authored-by: Jack Venberg <jack.venberg@rover.com>
2025-10-18 16:26:32 -07:00
Nagailic Sergiu (Nikro) 6842d705d5 fix(token-counter): extract model_info from deployment for custom_tokenizer (#15657) (#15680) 2025-10-17 19:38:45 -07:00
Achintya Rajan 264f1cded1 Merge branch 'main' into litellm_view_key_pagination_calls_fix 2025-10-06 18:10:57 -07:00
Krrish Dholakia 63cb2764fe test: fix raise 2025-10-04 16:11:22 -07:00
= 6ba077593f Update test_key_generate_prisma.py 2025-10-04 14:36:19 -07:00
= 5e03ef7382 fixes bloated key alias network calls with lean endpoint 2025-10-04 14:32:15 -07:00
Ishaan Jaffer 9c29f35c4b test_end_user_jwt_auth 2025-10-02 18:48:11 -07:00
Ishaan Jaffer ce57f59531 test_gemini_pass_through_endpoint 2025-09-27 17:17:12 -07:00
Ishaan Jaffer 0ec7dace79 test_embedding 2025-09-27 16:57:27 -07:00
Ishaan Jaffer 3c5e0abaf2 async_log_success_event 2025-09-27 14:17:13 -07:00
Ishaan Jaffer 6aa35ec999 test text-embedding-ada-002 2025-09-27 12:41:35 -07:00
Ishaan Jaffer c27beb74b9 test fix 2025-09-27 12:40:34 -07:00
Ishaan Jaffer 284a8549a1 test_chat_completion 2025-09-27 11:43:20 -07:00
Ishaan Jaffer 3baa3aff1b test fix 2025-09-27 10:38:35 -07:00
Mubashir Osmani 625ed3f8cf fix: prisma client state retries (#14925)
* added qwen models and gpt-5-codex

* fix flaky test

* fix failing test

* Added retries to prisma client state

* fix: prisma client state retries in pods

* Revert "fix failing test"

This reverts commit dbec4988a2627257fd05b905e216225664517f32.

* Revert "fix flaky test"

This reverts commit b0ac2f2dc35ca433af0c82f3cda770d6981caff4.

* Revert "added qwen models and gpt-5-codex"

This reverts commit 9a8a8f2d47ab4dc8aecb0cd9a6a4f82ed81bb056.

* Revert "fix: prisma client state retries in pods"

This reverts commit 04e58e5ca1a489916e3b49e9b674f5c6713fd7cd.

* fix lint

* Revert "fix lint"

This reverts commit 5303d52a5e3bee7e131dcabd098e94f0613a7bb9.

* fixed lint
2025-09-25 21:54:00 -07:00
Alexsander Hamir eaa04cd8ce fix: use fastuuid helper (#14903)
* fix: use fastuuid helper across the codebase

First batch of changes, simple drop in replacement.

* second batch of changes

* fixed: script mistake on helper file
2025-09-25 15:47:01 -07:00
Mubashir Osmani a7a6381926 fix: flaky passthrough tests (#14692)
* fix: flaky passthrough tests

* Revert "fix: flaky passthrough tests"

This reverts commit ffe692e017600a8853ab7c31f95485958ab74c5f.

* fix: serialize prisma objects
2025-09-18 15:35:14 -07:00
Krish Dholakia bfaab8ad7e Merge pull request #14557 from timelfrink/fix/issue-14478-bedrock-count-tokens-endpoint
Implement AWS Bedrock CountTokens API support
2025-09-17 23:51:06 -07:00
Tim Elfrink c234b13275 Apply code formatting and linting fixes
- Apply Black formatting to all Bedrock CountTokens files
- Clean up imports and remove unused variables in tests
- Fix indentation and simplify test structure
- Fix pyright type error with type ignore annotation
- All tests continue to pass after cleanup
2025-09-18 08:28:17 +02:00
Tim Elfrink e74ac35b5d Add comprehensive tests for Bedrock CountTokens functionality
- Add endpoint integration test in test_proxy_token_counter.py
- Add unit tests for transformation logic in bedrock/count_tokens/
- Test model extraction from request body vs endpoint path
- Test input format detection (converse vs invokeModel)
- Test request transformation from Anthropic to Bedrock format
- All tests follow existing codebase patterns and pass successfully
2025-09-18 08:16:56 +02:00
Mubashir Osmani 8b804303ed fix: ci/cd tests + lint errors (#14646)
* fix: lint errors + tests

* fixed ci tests

* fixed tests

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2025-09-17 17:06:43 -07:00
Sameer Kankute 69c01488bd remove not needed names (#14641) 2025-09-17 14:26:48 -07:00