- Add scope and url attributes to WebSocket mock in test_user_api_key_auth_websocket
- Add shared_realtime_ssl_context initialization in realtime handler test
* Cache realtime websocket request body
Move the realtime request payload builder out of the websocket handler and wrap it with an LRU cache so repeated connections reuse the same bytes object. This keeps the JSON formatting cost down while bounding memory usage.
* Optimize realtime websocket caching
Refactored /v1/realtime to use cached helpers for both the JSON body and query params, introduced a reusable request-scope template, and optimized header handling to avoid redundant work.
* Refine realtime websocket header handling
* Reuse websocket scope headers in auth
* Refactor realtime request body helper
Move the realtime request body formatter into proxy common utils so it can be reused across modules. Reuse it in the websocket auth flow to share LRU caching and avoid ad hoc byte builders.
* fix: revert to old pattern
The old pattern was necessary, we can just return the optimized function instead.
* Reuse SSL context for realtime
Create a shared SSLContext for OpenAI realtime websocket dials and pass it into websockets.connect so we stop re-reading verify paths on every session.
* feat: reuse shared TLS context for realtime websockets
- add `SHARED_REALTIME_SSL_CONTEXT` helper so all realtime websocket clients share the same TLS settings
- wire the shared context into OpenAI, Azure, custom HTTPX handlers, and realtime health checks
- update realtime tests to assert that the expected SSL context is passed to `websockets.connect`
This keeps TLS configuration consistent and avoids recreating SSL contexts per connection.
* Reuse HTTP SSL context for realtime
Remove the standalone realtime SSL helper, expose a shared context directly from the HTTP handler, and point all realtime websocket clients and tests to it. Add the websocket header comparison tool.
* Lazy-load shared realtime SSL context
Fix circular imports introduced by eagerly instantiating the shared TLS context. Make the HTTP handler lazily create the context and have realtime clients/tests fetch it on demand, keeping configuration consistent without breaking startup.
* add: unit test for realtime LRU caches
* fix: merge conflict with imports
* litellm_proxy_unit_testing_part1
* test proxy unit test
* litellm_proxy_unit_testing_key_generation
* test_async_call_with_key_over_model_budget
* test_aasync_call_with_key_over_model_budget
* Refactor proxy embeddings to use shared processor
- allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks
- route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses
- tighten token array decoding logic by using router deployment lookups and the unified error handler
* Fix: Correctly process embedding requests with token arrays
The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected.
This was caused by a combination of three distinct issues:
1. In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup.
2. In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers.
3. In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists.
Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions.
* test: align proxy embedding assertions
Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload.
* Update proxy exception test
The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.
* testing: unsure of this change
I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.
* fix: remove unrelated change
This change was not related to the embeddings refactor and actually belonged to a different branch.
* Fix embeddings endpoint call_type to use valid CallTypes enum value
Fixed bug where the `/embeddings` endpoint was passing `call_type="embeddings"`
to guardrail hooks, but "embeddings" is not a valid value in the CallTypes enum.
Changed to use `call_type="aembedding"` (async embedding) which is the correct
CallTypes enum value and matches the route_type used in the same function.
Added unit tests to verify:
- "embeddings" is not a valid CallTypes enum value
- "aembedding" is the correct valid value
- The fix prevents ValueError when guardrails are enabled
Fixes#16240
* Inline embeddings call type regression check
* Ensure embedding test preserves proxy metadata
* Add gemini api key in the custom api url
* Update tests
* Use api key n the header
* Use api key n the header
* fix mypy error
* fix mypy error
* fix test gemini auth
* fix(presidio.py): handle content as a list of texts
covers openai + anthropic messages api
* fix(presidio.py): safe get messages
* test: add unit testing for presidio guardrails
* fix(unified_guardrail.py): initial commit
* fix(enkryptai.py): implement apply_guardrail to enkrypt guardrail
* fix(unified_guardrail.py): support unified guardrail on input
* feat(unified_guardrail.py): add post call success hook implementation
allows us to just have 1 place to handle llm translation to guardrail api spec
* refactor: refactor initial unified guardrail component
* refactor: more refactoring
* feat(responses/): add guardrails to responses api
allows existing guardrails to work for new llm endpoints
* docs(adding_guardrail_support.md): document new guardrail endpoint support
* test: add unit tests
* feat(image_generation/): add guardrail support for image generation endpoint
* feat(openai/text_completion): support guardrails on `/v1/completions` API
* docs: document guardrails support on new endpoints
* docs: clarify when guardrails run
* feat(openai/speech): add guardrail support for input
* docs(rerank/): add guardrail support on input query
* fix: fix ruff check
* fix: use fastuuid helper across the codebase
First batch of changes, simple drop in replacement.
* second batch of changes
* fixed: script mistake on helper file
- Apply Black formatting to all Bedrock CountTokens files
- Clean up imports and remove unused variables in tests
- Fix indentation and simplify test structure
- Fix pyright type error with type ignore annotation
- All tests continue to pass after cleanup
- Add endpoint integration test in test_proxy_token_counter.py
- Add unit tests for transformation logic in bedrock/count_tokens/
- Test model extraction from request body vs endpoint path
- Test input format detection (converse vs invokeModel)
- Test request transformation from Anthropic to Bedrock format
- All tests follow existing codebase patterns and pass successfully