Commit Graph

243 Commits

Author SHA1 Message Date
Ishaan Jaff 41566722af [Feat] UI - Prompt Management - Allow testing prompts with Chat UI (#16898)
* TestPromptRequest

* add prompts/test endpoint for testing prompt

* TestPromptTestEndpoint

* feat: working v1 of this ui

* workig prompt endpoints

* add chat ui for prompts

* add conversation panel

* add init chat ui
2025-11-21 08:53:18 -08:00
Sameer Kankute 34cc532d8d Make sure that user inherits team permissions (#16639) 2025-11-18 20:14:42 -08:00
Ishaan Jaff 06eeb28c8f Litellm ci cd fixes 2 (#16693)
* litellm_proxy_unit_testing_part1

* test proxy unit test

* litellm_proxy_unit_testing_key_generation

* test_async_call_with_key_over_model_budget

* test_aasync_call_with_key_over_model_budget
2025-11-15 14:12:44 -08:00
Ishaan Jaffer 666913f76d test_async_call_with_key_over_model_budget 2025-11-15 09:26:29 -08:00
Ishaan Jaffer 63994e302e test_call_with_key_over_model_budget 2025-11-14 19:05:00 -08:00
Ishaan Jaffer 9e8653ad3c fix prisma client 2025-11-14 18:25:27 -08:00
Alexsander Hamir c7847125c2 [Perf] Embeddings: Use router's O(1) lookup and shared sessions (#16344)
* Refactor proxy embeddings to use shared processor

- allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks

- route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses

- tighten token array decoding logic by using router deployment lookups and the unified error handler

* Fix: Correctly process embedding requests with token arrays

The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected.

This was caused by a combination of three distinct issues:

1.  In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup.

2.  In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers.

3.  In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists.

Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions.

* test: align proxy embedding assertions

Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload.

* Update proxy exception test

The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.

* testing: unsure of this change

I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.

* fix: remove unrelated change

This change was not related to the embeddings refactor and actually belonged to a different branch.
2025-11-14 09:21:45 -08:00
Ishaan Jaffer ee8b1cfabc test_call_with_end_user_over_budget 2025-11-13 16:26:02 -08:00
yuneng-jiang cb27d6c456 [Fix] UI - Delete Callbacks Failing (#16473)
* Temp commit for branch switching

* Created normalize callback name util function and tests
2025-11-12 18:43:37 -08:00
Ishaan Jaff 5c9f50d584 [AI Gateway] - End User Budgets - Allow pointing max_end_user budget to an id, so the default ID applies to all end users (#16456)
* add _apply_budget_limits_to_end_user_params

* add _apply_budget_limits_to_end_user_params

* add _apply_budget_limits_to_end_user_params

* test_default_budget_applied_to_end_user_without_budget

* docs fix

* fix config
2025-11-11 08:20:13 -08:00
Cesar Garcia 16325024df fix: Use valid CallTypes enum value in embeddings endpoint (#16328)
* Fix embeddings endpoint call_type to use valid CallTypes enum value

Fixed bug where the `/embeddings` endpoint was passing `call_type="embeddings"`
to guardrail hooks, but "embeddings" is not a valid value in the CallTypes enum.

Changed to use `call_type="aembedding"` (async embedding) which is the correct
CallTypes enum value and matches the route_type used in the same function.

Added unit tests to verify:
- "embeddings" is not a valid CallTypes enum value
- "aembedding" is the correct valid value
- The fix prevents ValueError when guardrails are enabled

Fixes #16240

* Inline embeddings call type regression check

* Ensure embedding test preserves proxy metadata
2025-11-06 19:25:00 -08:00
Ishaan Jaffer b5d81a5d9c test_completion_text_003_prompt_array, test_key_generate_with_secret_manager_call 2025-11-06 17:18:01 -08:00
Petre Alexandru 911e802969 feat: add parallel execution handling in during_call_hook (#16279) 2025-11-05 18:35:25 -08:00
yuneng-jiang 5d158775b1 [Fix] Litellm non root docker Model Hub Table fix (#16282)
* Fix model hub table 404 on non-root docker

* Adding test
2025-11-05 18:30:20 -08:00
Sameer Kankute c45fad3855 Fix: Send Gemini API key via x-goog-api-key header with custom api_base (#16085)
* Add gemini api key in the custom api url

* Update tests

* Use api key n the header

* Use api key n the header

* fix mypy error

* fix mypy error

* fix test gemini auth
2025-11-05 07:12:13 -08:00
Bowen Liang 4e12e3f90d fix typo of orginal (#16255) 2025-11-04 18:55:44 -08:00
steve-gore-snapdocs 88240c4cba Fix Anthropic token counting for VertexAI (#16171)
* transform anthropic messages in gemini handler

* initial

* linting

* remove extra testt

* maintain consistency

* more tests

* Revert "transform anthropic messages in gemini handler"

This reverts commit 805e60fd2887991bb4b4554b9394437b874835f9.

* don't lint file we aren't changing

* cleanup

* cleanup

* Cleanup
2025-11-02 09:02:07 -08:00
Ishaan Jaffer cb57455172 test_foward_litellm_user_info_to_backend_llm_call 2025-10-27 13:48:23 -07:00
Krish Dholakia 2bd41dc034 Guardrails - Responses API, Image Gen, Text completions, Audio transcriptions, Audio Speech, Rerank, Anthropic Messages API support via the unified apply_guardrails function (#15706)
* fix(presidio.py): handle content as a list of texts

covers openai + anthropic messages api

* fix(presidio.py): safe get messages

* test: add unit testing for presidio guardrails

* fix(unified_guardrail.py): initial commit

* fix(enkryptai.py): implement apply_guardrail to enkrypt guardrail

* fix(unified_guardrail.py): support unified guardrail on input

* feat(unified_guardrail.py): add post call success hook implementation

allows us to just have 1 place to handle llm translation to guardrail api spec

* refactor: refactor initial unified guardrail component

* refactor: more refactoring

* feat(responses/): add guardrails to responses api

allows existing guardrails to work for new llm endpoints

* docs(adding_guardrail_support.md): document new guardrail endpoint support

* test: add unit tests

* feat(image_generation/): add guardrail support for image generation endpoint

* feat(openai/text_completion): support guardrails on `/v1/completions` API

* docs: document guardrails support on new endpoints

* docs: clarify when guardrails run

* feat(openai/speech): add guardrail support for input

* docs(rerank/): add guardrail support on input query

* fix: fix ruff check
2025-10-25 13:38:57 -07:00
Ishaan Jaffer 0bedf1c0a7 fix tests 2025-10-25 10:19:24 -07:00
Carlo Alberto Ferraris 8b1424166b attempt to avoid/minimize deadlocks (#15281)
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
2025-10-24 12:22:38 -07:00
Ishaan Jaff f55745fc5e [Fix] Forward anthropic-beta headers to Bedrock, VertexAI (#15700)
* [Fix] Forward anthropic-beta headers to Bedrock and other cross-provider scenarios (#15623)

* add_provider_specific_headers_to_request

* fix add_provider_specific_headers_to_request

* test_provider_specific_header_multi_provider

* test_provider_specific_header_in_request

---------

Co-authored-by: Jack Venberg <jack.venberg@rover.com>
2025-10-18 16:26:32 -07:00
Nagailic Sergiu (Nikro) 6842d705d5 fix(token-counter): extract model_info from deployment for custom_tokenizer (#15657) (#15680) 2025-10-17 19:38:45 -07:00
Achintya Rajan 264f1cded1 Merge branch 'main' into litellm_view_key_pagination_calls_fix 2025-10-06 18:10:57 -07:00
Krrish Dholakia 63cb2764fe test: fix raise 2025-10-04 16:11:22 -07:00
= 6ba077593f Update test_key_generate_prisma.py 2025-10-04 14:36:19 -07:00
= 5e03ef7382 fixes bloated key alias network calls with lean endpoint 2025-10-04 14:32:15 -07:00
Ishaan Jaffer 9c29f35c4b test_end_user_jwt_auth 2025-10-02 18:48:11 -07:00
Ishaan Jaffer ce57f59531 test_gemini_pass_through_endpoint 2025-09-27 17:17:12 -07:00
Ishaan Jaffer 0ec7dace79 test_embedding 2025-09-27 16:57:27 -07:00
Ishaan Jaffer 3c5e0abaf2 async_log_success_event 2025-09-27 14:17:13 -07:00
Ishaan Jaffer 6aa35ec999 test text-embedding-ada-002 2025-09-27 12:41:35 -07:00
Ishaan Jaffer c27beb74b9 test fix 2025-09-27 12:40:34 -07:00
Ishaan Jaffer 284a8549a1 test_chat_completion 2025-09-27 11:43:20 -07:00
Ishaan Jaffer 3baa3aff1b test fix 2025-09-27 10:38:35 -07:00
Mubashir Osmani 625ed3f8cf fix: prisma client state retries (#14925)
* added qwen models and gpt-5-codex

* fix flaky test

* fix failing test

* Added retries to prisma client state

* fix: prisma client state retries in pods

* Revert "fix failing test"

This reverts commit dbec4988a2627257fd05b905e216225664517f32.

* Revert "fix flaky test"

This reverts commit b0ac2f2dc35ca433af0c82f3cda770d6981caff4.

* Revert "added qwen models and gpt-5-codex"

This reverts commit 9a8a8f2d47ab4dc8aecb0cd9a6a4f82ed81bb056.

* Revert "fix: prisma client state retries in pods"

This reverts commit 04e58e5ca1a489916e3b49e9b674f5c6713fd7cd.

* fix lint

* Revert "fix lint"

This reverts commit 5303d52a5e3bee7e131dcabd098e94f0613a7bb9.

* fixed lint
2025-09-25 21:54:00 -07:00
Alexsander Hamir eaa04cd8ce fix: use fastuuid helper (#14903)
* fix: use fastuuid helper across the codebase

First batch of changes, simple drop in replacement.

* second batch of changes

* fixed: script mistake on helper file
2025-09-25 15:47:01 -07:00
Mubashir Osmani a7a6381926 fix: flaky passthrough tests (#14692)
* fix: flaky passthrough tests

* Revert "fix: flaky passthrough tests"

This reverts commit ffe692e017600a8853ab7c31f95485958ab74c5f.

* fix: serialize prisma objects
2025-09-18 15:35:14 -07:00
Krish Dholakia bfaab8ad7e Merge pull request #14557 from timelfrink/fix/issue-14478-bedrock-count-tokens-endpoint
Implement AWS Bedrock CountTokens API support
2025-09-17 23:51:06 -07:00
Tim Elfrink c234b13275 Apply code formatting and linting fixes
- Apply Black formatting to all Bedrock CountTokens files
- Clean up imports and remove unused variables in tests
- Fix indentation and simplify test structure
- Fix pyright type error with type ignore annotation
- All tests continue to pass after cleanup
2025-09-18 08:28:17 +02:00
Tim Elfrink e74ac35b5d Add comprehensive tests for Bedrock CountTokens functionality
- Add endpoint integration test in test_proxy_token_counter.py
- Add unit tests for transformation logic in bedrock/count_tokens/
- Test model extraction from request body vs endpoint path
- Test input format detection (converse vs invokeModel)
- Test request transformation from Anthropic to Bedrock format
- All tests follow existing codebase patterns and pass successfully
2025-09-18 08:16:56 +02:00
Mubashir Osmani 8b804303ed fix: ci/cd tests + lint errors (#14646)
* fix: lint errors + tests

* fixed ci tests

* fixed tests

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2025-09-17 17:06:43 -07:00
Sameer Kankute 69c01488bd remove not needed names (#14641) 2025-09-17 14:26:48 -07:00
Krish Dholakia 635dc72211 Merge pull request #14604 from Sameerlite/litellm_gemini_api_base_update
Litellm gemini api base update
2025-09-16 22:38:44 -07:00
Alexsander Hamir 02db2e8ae8 [Performance] RPS Improvement +500 RPS when sending the user field (#14616)
* perf tool

* fix: cache type issue

* fix: exception hanging & cache setting

1. Removed unhandled exceptions
2. Set cache value to dict
2025-09-16 16:18:23 -07:00
Sameerlite f08fc45a0f add base url support for gemini 2025-09-16 15:15:24 +05:30
Sameer Kankute 1a123b2cd5 Litellm gemini cli bug fix (#14451)
* Fix gemini cli error

* Add reasoning request support

* Added better handling

* remove other PR code

* refactored code for better structure following

---------

Co-authored-by: sameer@berri.ai <sameer@berri.ai>
2025-09-12 11:55:26 -07:00
Krrish Dholakia c45ede7187 test: update test 2025-09-09 21:31:34 -07:00
Ishaan Jaff 2cc85936ed Revert "Security fix - prevent proxy_admin_viewer from modifying other user's credentials + remove hardcoded sensitive keys from test repo" (#14362) 2025-09-08 18:40:54 -07:00
Krrish Dholakia 06d472f205 test: fix tests 2025-09-06 21:59:02 -07:00