Commit Graph

89 Commits

Author SHA1 Message Date
Ishaan Jaff 4e195d639e [Feat] New API - Claude Skills API (Anthropic) (#17042)
* init readme

* init BaseSkillsAPIConfig

* init types for Skills APIs

* add feat: add create, list, retrieve skills

* add base skills config

* add BaseSkillsAPIConfig

* add get_provider_skills_api_config

* init skills

* add ANTHROPIC_SKILLS_API_BETA_VERSION

* init skills APIs

* working list, get skills

* working e2e skills API anthropic API

* add _prepare_skill_multipart_request

* add skills routes to llm api routes

* router _initialize_skills_endpoints

* add fix skills endpoints

* add convert_upload_files_to_file_data

* fix routing skills endpoints

* fix route llm request

* Potential fix for code scanning alert no. 3806: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 3809: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* fix ruff checks

* test_initialize_skills_endpoints

* fix claude skills mypy linting errors

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-11-24 15:01:40 -08:00
Alexsander Hamir 6e70c279f8 [Fix] - Router's Cache: Fix routing for requests with same cacheable prefix but different user messages (#16951)
* fix(router): use cacheable prefix for prompt caching cache keys

Fix issue where requests with same cacheable prefix but different user
messages were routing to different deployments, preventing cached token
reuse. The cache key now correctly includes only the cacheable prefix
(up to and including the last cache_control block) instead of the
entire messages array.

## New Functions

### extract_cacheable_prefix()
Static method that extracts the cacheable prefix from messages for
prompt caching. The cacheable prefix is defined as everything UP TO
AND INCLUDING the LAST content block (across all messages) that has
cache_control with type "ephemeral". This includes ALL blocks
before the last cacheable block (even if they don't have cache_control
themselves).

- Finds the last content block with cache_control across all messages
- Returns all messages and content blocks up to and including that
  last cacheable block
- Excludes everything after the last cacheable block (including user
  messages that come after)
- Returns empty list if no cacheable blocks are found

## Changed Functions

### get_prompt_caching_cache_key()
Modified to use the cacheable prefix instead of the full messages array
when generating cache keys. This ensures that requests with the same
cacheable prefix but different user messages generate the same cache
key, enabling proper routing to the same deployment.

- Now calls extract_cacheable_prefix() to get only cacheable content
- Returns None if no cacheable prefix is found (can't generate key)
- Cache key is now based on cacheable prefix only, not full messages

### async_get_model_id()
Completely refactored to use the cacheable prefix directly instead of
the previous workaround that checked progressively shorter message
slices. The previous implementation was inefficient and unreliable.

- Removed progressive message slicing logic (messages[:-1], messages[:-2], etc.)
- Now uses single direct cache lookup with cacheable prefix-based key
- More efficient (1 lookup instead of up to 4)
- More reliable (uses correct cache key based on cacheable prefix)
- Returns None if no cacheable prefix found

### add_model_id()
Added None check for cache_key to prevent caching when no cacheable
prefix is found. This ensures we don't attempt to cache when there's
no meaningful cache key to use.

- Added guard: returns early if cache_key is None
- Prevents attempting to cache when no cacheable prefix exists

### async_add_model_id()
Added None check for cache_key to prevent caching when no cacheable
prefix is found. Matches the behavior of add_model_id() for consistency.

- Added guard: returns early if cache_key is None
- Prevents attempting to cache when no cacheable prefix exists

### get_model_id()
Added None check for cache_key to handle cases where no cacheable
prefix is found. Ensures consistent behavior across all cache methods.

- Added guard: returns None if cache_key is None
- Prevents calling get_cache() with None key

## Test

### test_router_prompt_caching_same_cacheable_prefix_routes_to_same_deployment()
New end-to-end test that validates the fix. Tests that requests with
the same cacheable prefix (system blocks with cache_control) but
different user messages:
1. Generate the same cache key
2. Successfully perform cache lookup
3. Route to the same deployment

This test reproduces the exact scenario from the user's bug report
where three requests with different user messages should route to the
same deployment but were previously routing to different ones.

Fixes issue where cached tokens couldn't be reused because requests
were routed to different providers due to different cache keys.

* fix(router): use cast() for proper type handling in extract_cacheable_prefix

Replace type annotation with type: ignore comment with proper cast()
from typing module, matching the pattern used throughout the
codebase for creating modified AllMessageValues dictionaries.
2025-11-21 19:13:40 -08:00
yuneng-jiang 4b25398afe [Infra] CI/CD Fixes (#16937)
* Attempt CI/CD Fix

* Adding test for coverage

* Adding max depth to copilot and vertex

* Fixing mypy lint and docker database

* Fixing UI build issues

* Update playwright test
2025-11-21 13:58:19 -08:00
Ishaan Jaff b30439257b [Feat] Add RunwayML Img Gen API support (#16557)
* TestRunwaymlImageGeneration

* fix RUNWAYML

* rename

* fix rename

* get_runwayml_image_generation_config

* get_runwayml_image_generation_config

* TestRunwaymlImageGeneration

* add RUNWAYML_POLLING_TIMEOUT

* fix rnwayml transform img gen

* runwayml_image_cost_calculator

* runwayml_image_cost_calculator

* docs runwayml

* fix runwayML polling

* test_get_first_default_fallback
2025-11-12 18:20:14 -08:00
Ishaan Jaffer 7ec2d103f6 test router endpoitns 2025-11-06 16:17:14 -08:00
Krrish Dholakia 7ad6abeb1c fix(batch_utils.py): improve batch utils to handle newlines within batch content
Fixes LIT-1376
2025-11-05 16:15:09 -08:00
Sameer Kankute eed3ad0bdb Fix: Moderations endpoint now respects api_base configuration parameter (#16087)
* Update moderation to use api base

* Update moderation to use api base

* Fix mypy error
2025-10-30 11:01:26 -07:00
Ishaan Jaff f5a80110c1 [Feat] Add /search endpoint on LiteLLM Gateway (#15780)
* add SearchProvider

* add SearchToolTypedDict

* add search

* add SearchAPIRouter

* working router level search

* add search to allowed llm / ocr routes

* feat: add search_router

* add routing + proxy for search APIs

* /v1/search/{search_tool_name}

* fix search routing

* feat: parse_search_tools

* clean up sidebar

* docs fix

* router tests for search tools

* docs fix
2025-10-21 19:05:20 -07:00
Ishaan Jaff b1b96ff3cf [Perf] Alexsander fixes round 2 - Oct 18th (#15695)
* perf(router): Optimize prompt management model check with early exit

Add early return for models without '/' to avoid expensive get_model_list()
calls for 99% of standard model requests (gpt-4, claude-3, etc).

- Refactor _is_prompt_management_model() with "/" check before model lookup
- Add unit tests to verify optimization doesn't break detection

* perf(caching): optimize Redis batch cache operations and reduce unnecessary queries

This commit introduces several performance optimizations to the Redis caching layer:

**DualCache Improvements (dual_cache.py):**

1. Increase batch cache size limit from 100 to 1000
   - Allows for larger batch operations, reducing Redis round-trips

2. Throttle repeated Redis queries for cache misses
   - Update last_redis_batch_access_time for ALL queried keys, including those
     with None values
   - Prevents excessive Redis queries for frequently-accessed non-existent keys

3. Add early exit optimization
   - Short-circuit when redis_result is None or contains only None values
   - Avoids unnecessary processing when no cache hits are found

4. Optimize key lookup performance
   - Replace O(n) keys.index() calls with O(1) dict lookup via key_to_index mapping
   - Reduces algorithmic complexity in batch operations

5. Streamline cache updates
   - Combine result updates and in-memory cache updates in single loop
   - Only cache non-None values to avoid polluting in-memory cache

**CooldownCache Improvements (cooldown_cache.py):**

1. Enhanced early return logic
   - Check if all values in results are None, not just if results is None
   - Prevents unnecessary iteration when no valid cooldown data exists

These changes significantly improve Redis caching performance, especially for:
- High-throughput batch operations
- Scenarios with frequent cache misses
- Large-scale deployments with many concurrent requests

* fix: remove unnecessary test

* refactor: move default_max_redis_batch_cache_size to constants

- Add DEFAULT_MAX_REDIS_BATCH_CACHE_SIZE constant (default: 1000)
- Update DualCache to use constant from constants.py
- Document new environment variable in config_settings.md

* fix: only use in memory cache when set

* fix(router): improve prompt management model detection with smart early return

The previous early return optimization in _is_prompt_management_model() was
checking if the model name parameter contained '/' and returning False if it
didn't. This broke detection for model aliases (e.g., 'chatbot_actions') that
don't have '/' in their name but map to prompt management models
(e.g., 'langfuse/openai-gpt-3.5-turbo').

Changed the early return logic to only exit early when:
- Model name contains '/' AND
- The prefix is NOT a known prompt management provider

This maintains the performance optimization for 99% of direct model calls
(avoiding expensive get_model_list lookups) while correctly handling:
- Direct prompt management calls (e.g., 'langfuse/model')
- Model aliases without '/' (e.g., 'chatbot_actions')
- Regular models with/without '/' (e.g., 'gpt-3.5-turbo', 'openai/gpt-4')

Fixes test: test_router_prompt_management_factory

* perf(router): optimize _pre_call_checks with shallow copy (1400x faster)

Replace deepcopy with list() in _pre_call_checks - runs on every request.
Only pops from list, never modifies deployment dicts, so shallow copy is safe.

Performance: 1400x faster on hot path
Impact: 2-5x overall throughput improvement for routing workloads
Tests: Added regression test to ensure no mutation + filtering works

* perf(router): replace deepcopy with shallow copy for default deployment

Replace expensive copy.deepcopy() with shallow copy for default_deployment
in _common_checks_available_deployment() hot path.

Changes:
- Use dict.copy() for top-level deployment dict
- Use dict.copy() for nested litellm_params dict
- Only the 'model' field is modified, so deep recursion is unnecessary

Impact:
- 100x+ faster for default deployment path (every request when used)
- deepcopy recursively traverses entire object tree
- Shallow copy only copies two dict levels (exactly what's needed)

Test coverage:
- Added regression test to verify deployment isolation
- Ensures returned deployments don't mutate original default_deployment
- Validates multiple concurrent requests get independent copies

* perf(router): remove unnecessary dict copy in completion hot paths

Remove unnecessary deployment['litellm_params'].copy() in _completion
and _acompletion functions. The dict is only read and spread into a new
dict, never modified, making the defensive copy wasteful.

Changes:
- Remove .copy() in _completion (sync hot path)
- Remove .copy() in _acompletion (async hot path)

Impact:
- Every completion request (highest traffic endpoints)
- Eliminates unnecessary dict allocation and copy on every call
- Dict spreading already creates new dict, so no mutation possible

Test coverage:
- Added tests verifying deployment params unchanged after calls
- Tests both sync and async completion paths
- Validates optimization doesn't introduce mutations

* perf(router): optimize deployment filtering in pre-call checks

Replace O(n²) list pop pattern with O(n) set-based filtering in
_pre_call_checks() to improve routing performance under high load.

Changes:
- Use set() instead of list for invalid_model_indices tracking
- Replace reversed list.pop() loop with single-pass list comprehension
- Eliminate redundant list→set conversion overhead

Impact:
- Hot path optimization: runs on every request through the router
- ~2-5x faster filtering when many deployments fail validation
- Most beneficial with 50+ deployments per model group or high
  invalidation rates (rate limits, context window exceeded)

Technical details:
Old: O(k²) where k = invalid deployments (pop shifts remaining elements)
New: O(n) single pass with O(1) set membership checks

* add: memory profiler

feat(proxy): Add configurable GC thresholds and enhance memory debugging endpoints

- Add PYTHON_GC_THRESHOLD env var to configure garbage collection thresholds
- Add POST /debug/memory/gc/configure endpoint for runtime GC tuning
- Enhance memory debugging endpoints with better structure and explanations
- Add comprehensive router and cache memory tracking
- Include worker PID in all debug responses for multi-worker debugging

* refactor: reduce complexity in get_memory_details endpoint

Extract 6 helper functions from get_memory_details to fix linter
error PLR0915 (too many statements). Improves maintainability
while preserving functionality.

* fix(router): remove incorrect early exit in _is_prompt_management_model

Removes early exit optimization that checked model_name prefix instead
of the actual litellm_params model. This incorrectly returned False for
custom model aliases that map to prompt management providers.

Example: "my-langfuse-prompt/test_id" -> "langfuse_prompt/actual_id"

The method now correctly checks the underlying model's prefix.

Fixes test_is_prompt_management_model_optimization

* fix(proxy): add explicit type annotations to debug_utils dictionaries

Resolved 6 mypy type errors in proxy/common_utils/debug_utils.py by adding
explicit Dict[str, Any] annotations to dictionary variables where mypy was
incorrectly inferring narrow types. This allows the dictionaries to accept
different value types (strings, nested dicts) for error handling and various
return structures.

Fixed:
- Line 246: caches dictionary in get_memory_summary()
- Line 371: cache_stats dictionary in _get_cache_memory_stats()
- Line 439: litellm_router_memory dictionary in _get_router_memory_stats()

* fix(proxy): fix Python 3.8 compatibility in debug_utils type annotations

- Replace tuple[...], list[...] with Tuple[...], List[...] from typing
- Replace Dict | None with Optional[Dict] for Python 3.8 compatibility
- Add missing imports: List, Optional, Tuple to typing imports

Fixes TypeError: 'type' object is not subscriptable in Python 3.8

---------

Co-authored-by: AlexsanderHamir <alexsanderhamirgomesbaptista@gmail.com>
2025-10-18 11:12:00 -07:00
Ishaan Jaff 3852fc96c1 [Oct Staging Branch] (#15460)
* Implement fix for thinking_blocks and converse API calls

This fixes Claude's models via the Converse API, which should also fix
Claude Code.

* Add thinking literal

* Fix mypy issues

* Type fix for redacted thinking

* Add voyage model integration in sagemaker

* Add config file logic

* Use already exiting voyage transformation

* refactor code as per comments

* fix merge error

* refactor code as per comments

* refactor code as per comments

* UI new build

* [Fix] router - regression when adding/removing models  (#15451)

* fix(router): update model_name_to_deployment_indices on deployment removal

When a deployment is deleted, the model_name_to_deployment_indices map
was not being updated, causing stale index references. This could lead
to incorrect routing behavior when deployments with the same model_name
were dynamically removed.

Changes:
- Update _update_deployment_indices_after_removal to maintain
  model_name_to_deployment_indices mapping
- Remove deleted indices and decrement indices greater than removed index
- Clean up empty entries when no deployments remain for a model name
- Update test to verify proper index shifting and cleanup behavior

* fix(router): remove redundant index building during initialization

Remove duplicate index building operations that were causing unnecessary
work during router initialization:

1. Removed redundant `_build_model_id_to_deployment_index_map` call in
   __init__ - `set_model_list` already builds all indices from scratch

2. Removed redundant `_build_model_name_index` call at end of
   `set_model_list` - the index is already built incrementally via
   `_create_deployment` -> `_add_model_to_list_and_index_map`

Both indices (model_id_to_deployment_index_map and
model_name_to_deployment_indices) are properly maintained as lookup
indexes through existing helper methods. This change eliminates O(N)
duplicate work during initialization without any behavioral changes.

The indices continue to be correctly synchronized with model_list on
all operations (add/remove/upsert).

* fix(prometheus): Fix Prometheus metric collection in a multi-workers environment (#14929)

Co-authored-by: sotazhang <sotazhang@tencent.com>

* Add tiered pricing and cost calculation for xai

* Use generic cost calculator

* Resolve conflicts in generated HTML files

* Remove penalty params as supported params for gemini preview model (#15503)

* fix conversion of thinking block

* add application level encryption in SQS (#15512)

* docs: fix doc

* docs(index.md): bump rc

* [Fix] GEMINI - CLI -  add google_routes to llm_api_routes (#15500)

* fix: add google_routes to llm_api_routes

* test: test_virtual_key_llm_api_routes_allows_google_routes

* build: bump version

* bump: version 1.78.0 → 1.78.1

* add application level encryption in SQS

* add application level encryption in SQS

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: deepanshu <deepanshu.lulla@hq.bill.com>

* [Feat] Bedrock Knowledgebase - return search_response when using /chat/completions API with LiteLLM (#15509)

* docs: fix doc

* docs(index.md): bump rc

* [Fix] GEMINI - CLI -  add google_routes to llm_api_routes (#15500)

* fix: add google_routes to llm_api_routes

* test: test_virtual_key_llm_api_routes_allows_google_routes

* add AnthropicCitation

* fix async_post_call_success_deployment_hook

* fix add vector_store_custom_logger to global callbacks

* test_e2e_bedrock_knowledgebase_retrieval_with_llm_api_call

* async_post_call_success_deployment_hook

* add async_post_call_streaming_deployment_hook

* async def test_e2e_bedrock_knowledgebase_retrieval_with_llm_api_call_streaming(setup_vector_store_registry):

* fix _call_post_streaming_deployment_hook

* fix async_post_call_streaming_deployment_hook

* test update

* docs: Accessing Search Results

* docs KB

* fix chatUI

* fix searchResults

* fix onSearchResults

* fix kb

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>

* [Feat] Add dynamic rate limits on LiteLLM Gateway  (#15518)

* docs: fix doc

* docs(index.md): bump rc

* [Fix] GEMINI - CLI -  add google_routes to llm_api_routes (#15500)

* fix: add google_routes to llm_api_routes

* test: test_virtual_key_llm_api_routes_allows_google_routes

* build: bump version

* bump: version 1.78.0 → 1.78.1

* fix: KeyRequestBase

* fix rpm_limit_type

* fix dynamic rate limits

* fix use dynamic limits here

* fix _should_enforce_rate_limit

* fix _should_enforce_rate_limit

* fix counter

* test_dynamic_rate_limiting_v3

* use _create_rate_limit_descriptors

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>

* Add google rerank endpoint

* Add docs

* fix mypy error

* fix mypy and lint errors

* Add haiku 4.5 integration

* Add haiku 4.5 integration for other regions as well

* Handle citation field correctly

* Fix filtering headers for signature calcs

* Add haiku 4.5 integration (#15650)

---------

Co-authored-by: Leslie Cheng <leslie.cheng5@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com>
Co-authored-by: Lucas <10226902+LoadingZhang@users.noreply.github.com>
Co-authored-by: sotazhang <sotazhang@tencent.com>
Co-authored-by: Deepanshu Lulla <deepanshu.lulla@gmail.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: deepanshu <deepanshu.lulla@hq.bill.com>
2025-10-17 17:52:25 -07:00
AlexsanderHamir bdca4db941 Merge branch 'litellm_october_alexsander_stanging' of https://github.com/BerriAI/litellm into litellm_october_alexsander_stanging 2025-10-16 14:43:30 -07:00
AlexsanderHamir e154ab6a66 test: ensure model_names is a O(1) datastructure 2025-10-16 14:42:24 -07:00
AlexsanderHamir 76b272dffa test(router): update error message assertion after string concat optimization
Update test_generate_model_id_with_deployment_model_name to accept the new
error message format that results from the list+join optimization.

The function still correctly rejects None values with a TypeError, but the
error message changed from 'unsupported operand type(s) for +=' to
'expected str instance, NoneType found' due to the implementation change
from string concatenation to list joining.
2025-10-16 14:41:59 -07:00
AlexsanderHamir 35de05dc8b test: add static analysis to prevent O(n) linear scans in router
Add AST-based test to detect 'for ... in self.model_list' anti-pattern.
Enforces use of index maps (model_id_to_deployment_index_map and
model_name_to_deployment_indices) for O(1) lookups instead of O(n) iteration.
2025-10-16 14:41:58 -07:00
Alexsander Hamir 1846363b94 Merge branch 'litellm_october_alexsander_stanging' into litellm_router_index_change 2025-10-16 09:07:15 -07:00
AlexsanderHamir 5e929dad2d test: add static analysis to prevent O(n) linear scans in router
Add AST-based test to detect 'for ... in self.model_list' anti-pattern.
Enforces use of index maps (model_id_to_deployment_index_map and
model_name_to_deployment_indices) for O(1) lookups instead of O(n) iteration.
2025-10-16 09:04:57 -07:00
Alexsander Hamir a17f3a7aa8 Merge pull request #15578 from BerriAI/litellm_remove_list_lookup
perf(router): optimize model lookups with O(1) data structures
2025-10-15 17:59:55 -07:00
AlexsanderHamir 56838e2388 test: ensure model_names is a O(1) datastructure 2025-10-15 17:52:44 -07:00
AlexsanderHamir 97ed3d01ec test(router): update error message assertion after string concat optimization
Update test_generate_model_id_with_deployment_model_name to accept the new
error message format that results from the list+join optimization.

The function still correctly rejects None values with a TypeError, but the
error message changed from 'unsupported operand type(s) for +=' to
'expected str instance, NoneType found' due to the implementation change
from string concatenation to list joining.
2025-10-15 15:58:08 -07:00
Krish Dholakia 9c3e29b9f0 Merge pull request #15330 from jlan-nl/litellm-fix-erroneous-gpt5-cooldown-trigger
Minimal fix: gpt5 models should not go on cooldown when called with temperature!=1
2025-10-09 22:34:30 -07:00
IQHL (Hans Jacob Landelius) 6633b33085 formatting 2025-10-08 14:15:15 +02:00
IQHL (Hans Jacob Landelius) 5b9d516e43 unit test 2025-10-08 13:58:25 +02:00
Tim Elfrink f8af72bbf1 fix: redact AWS credentials when redact_user_api_key_info enabled
Use SensitiveDataMasker in print_deployment() to mask all sensitive
credentials including AWS keys when redact_user_api_key_info is True.

Fixes #14839
2025-10-08 10:31:35 +02:00
Alexsander Hamir ddb90c9ad7 [Fix] - Router: add model_name index for O(1) deployment lookups (#15113)
* perf(router): add model_name index for O(1) deployment lookups

Add model_name_to_deployment_indices mapping to optimize _get_all_deployments()
from O(n) to O(1) + O(k) lookups.

- Add model_name_to_deployment_indices: Dict[str, List[int]]
- Add _build_model_name_index() to build/maintain the index
- Update _add_model_to_list_and_index_map() to maintain both indices
- Refactor to use idx = len(self.model_list) before append (cleaner)
- Optimize _get_all_deployments() to use index instead of linear scan

* test(router): add test coverage for _build_model_name_index

Add single comprehensive test for _build_model_name_index() function to fix
code coverage CI failure.

The test verifies:
- Index correctly maps model_name to deployment indices
- Handles multiple deployments per model_name
- Clears and rebuilds index correctly

Fixes: CI code coverage error for _build_model_name_index
2025-10-06 08:14:11 -07:00
Alexsander Hamir d4830e34e5 fix: remove router inefficiencies (from O(M*N) to O(1)) - 62.5% faster P99 latency (#15046)
* fix: remove redundant deep copy

set_model_list already does the deep copy at the beginning of the call.

* fix: remove unused model_list arguments

The `model_list` parameter was being passed to classes that did not use it.

* fix: reduce per-request memory and time from O(N×M) to O(N)

No need to create a whole array for a simple look up.

* add: missing test

* fix: remove unused parameter
2025-09-29 15:49:46 -07:00
Alexsander Hamir a4eec173bc fix: reduce get_deployment cost to O(1) (#14967)
* fix: reduce get_deployment cost to O(1)

* fix: add unit test

* fix: cleaner

* fix: reference errors

* fix: add missing unit tests
2025-09-27 13:44:10 -07:00
Ishaan Jaff 982800069c [Bug Fix] x-litellm-tags not routing with Responses API (#14289)
* fix: get_deployments_for_tag

* fix get_deployments_for_tag

* test_router_tag_routing.py

* test_get_metadata_variable_name_from_kwargs

* fix mapped tests

* docs fix
2025-09-05 09:40:37 -07:00
Ishaan Jaff b9132968b2 [Perf] Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS (#13905)
* [Performance] Reduce Significant CPU overhead from litellm_logging.py (#13895)

* fix: litellm.configured_cold_storage_logger

* fix Session Management - Non-OpenAI Models docs

* ruff fix

* test fix

* create LoggingWorker

* add GLOBAL_LOGGING_WORKER for async task handling

* fix logging tests

* add conftest

* fix conftest

* test fix location of encode bedrock runtime modelid arn

* fix conftest.py

* tuning LoggingWorker

* conftest.py

* fix conftest batches/

* test_async_chat_azure

* event_loop

* test_bedrock_streaming_passthrough_test2

* fix GLOBAL_LOGGING_WORKER

* logging worker

* add flush for global logging worker

* Revert "fix GLOBAL_LOGGING_WORKER"

This reverts commit d254f508f48935652f054777652938ad71976cce.

* fix conftest clear_queue

* fix conftest clear_queue

* setup_and_teardown for llm translation

* docs AWS_REGION

* test_async_chat_azure

* change test DIR

* run ci/cd again

* use 1 job for litellm_router_unit_testing

* fix space

* fix litellm_router_unit_testing

* test_aaarouter_dynamic_cooldown_message_retry_time

* litellm_router_unit_testing

* conftest.py clearing qu

* fixes litellm_router_unit_testing

* fixes clear_queue

* fix router_unit_tests

* remove conftest

* add back conftest for router

* fix event loop test

* test fix

* fixes for LoggingWorker

* ruff fix
2025-08-23 13:13:23 -07:00
Ishaan Jaff e69a895884 test_apply_default_settings 2025-08-20 08:53:38 -07:00
Krish Dholakia 3b52545db3 Merge pull request #13529 from BerriAI/litellm_dev_08_11_2025_p1
[Fix] Cooldowns - don't return raw Azure Exceptions to client
2025-08-18 18:54:19 -07:00
Thiago Salvatore 169a17400f fix(vertexai-batch): fix vertexai batch file format (#13576)
* fix(access group): allow access group on mcp tool retrieval

* fix(test): fix broken tests and add test case for access group

* fix(mypy): fix typing issues

* fix(memory file): add content type to in memory file
2025-08-18 10:19:23 -07:00
Jugal D. Bhatt aea0605eed [LLM Translation] Fix Realtime API endpoint for no intent (#13476)
* fix intent params

* Add responses

* fix unrelated test

* test fix - fireworks API endpoint is down

* test fix fireworks ai is having an active outage

* test_completion_cost_databricks

* dbrx fix test API currently not responding

* Update OpenAI Realtime handler to use the correct endpoint and include all query parameters. Adjusted error messages for missing API base and key. Updated health check URL construction to pass model as a query parameter.

* Enhance OpenAI Realtime handler tests to ensure model parameter inclusion in WebSocket URL. Added new tests to verify correct URL construction with model and additional parameters, preventing 'missing_model' errors. Updated existing tests for consistency.

* Remove debug print statements for API base and key in OpenAIRealtime handler to clean up the code.

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2025-08-14 16:24:14 -07:00
Krrish Dholakia dab3acdd26 fix(handle_error.py): add unit tests 2025-08-11 18:25:30 -07:00
Krrish Dholakia 92ebf5b918 fix(router.py): fix print statement 2025-08-11 17:46:14 -07:00
Jugal D. Bhatt b6fcda2f8a [LLM Translation] Fix model group on clientside auth with API calls (#13314)
* fix unsupported operand type(s) for +=: 'NoneType' and 'str' on clientside auth creds for responses

* fix the client side auth to use correct metadata

* add more tests

* fix tests
2025-08-05 17:46:47 -07:00
Jugal D. Bhatt 32501c85f5 fix unsupported operand type(s) for +=: 'NoneType' and 'str' on clientside auth creds for responses (#13293) 2025-08-05 13:16:16 -07:00
Ishaan Jaff 39b0fe0bf7 [Feat] Edit Auto Router Settings on UI (#12966)
* EditAutoRouterTabProps

* Revert "EditAutoRouterTabProps"

This reverts commit 2835d3a3743e6411b9914a0b01381050e2273ad7.

* add EditAutoRouterTab

* delete edit

* fixes for edit auto-router

* fix accessing model edit

* working edit auto router

* fix - edit remove custom model name

* fixes for edit auto router settings

* qa for adding a model router

* test fix
2025-07-24 21:25:48 -07:00
Ishaan Jaff 106a298f0a [Feat] UI - Allow Adding LiteLLM Auto Router on UI (#12960)
* add router.json

* test_router_auto_router

* async_pre_routing_hook

* fixes for auto router

* add async_pre_routing_hook

* add LiteLLMRouterEncoder

* update test auto_router_embedding_model

* add auto_router_embedding_model

* add AutoRouter

* fix async_pre_routing_hook

* update async_pre_routing_hook

* fix auto router

* fix router.json

* working router init

* working embedding encoder

* working auto router

* test_router_auto_router

* test auto router

* add semantic-router as optional for litellm

* add extras

* semantic_router==0.1.10

* ruff fix

* use aiohttp==3.10.11

* python-dotenv==1.0.1

* test auto router

* test_router_auto_router

* semantic_router

* test_is_auto_router_deployment

* fix check

* fix docker build step

* add semantic_router

* UI  - Add auto router on litellm

* working utterances config

* fix route config builder

* kind of working add automodel router

* move loc of add deployment

* fixes for AutoRouter

* add auto_router_config in types.py

* fixes for init_auto_router_deployment

* fix adding auto router models

* working auto-router with dB

* Revert "add semantic_router"

This reverts commit 537b67288798731a119d811f643b682086377ee9.

* TestAutoRouter

* fix linting

* add semantic router to docker

* test fix

* fix router config builder

* remove export button
2025-07-24 19:58:49 -07:00
Ishaan Jaff b8e404dd95 [Feat] Backend Router - Add Auto-Router powered by semantic-router (#12955)
* add router.json

* test_router_auto_router

* async_pre_routing_hook

* fixes for auto router

* add async_pre_routing_hook

* add LiteLLMRouterEncoder

* update test auto_router_embedding_model

* add auto_router_embedding_model

* add AutoRouter

* fix async_pre_routing_hook

* update async_pre_routing_hook

* fix auto router

* fix router.json

* working router init

* working embedding encoder

* working auto router

* test_router_auto_router

* test auto router

* add semantic-router as optional for litellm

* add extras

* semantic_router==0.1.10

* ruff fix

* use aiohttp==3.10.11

* python-dotenv==1.0.1

* test auto router

* test_router_auto_router

* semantic_router

* test_is_auto_router_deployment

* fix check

* fix docker build step

* add semantic_router

* Revert "add semantic_router"

This reverts commit 537b67288798731a119d811f643b682086377ee9.
2025-07-24 18:32:56 -07:00
Ishaan Jaff 5802a5bbe3 [Feat] LLM API Endpoint - Expose OpenAI Compatible /vector_stores/{vector_store_id}/search endpoint (#12749)
* fix _pass_through_endpoint_without_required_model

* add get_litellm_managed_vector_store_from_registry

* undo router change

* fix for using router + vector search methods

* add simple helper for _update_request_data_with_litellm_managed_vector_store_registry

* add vector_stores routes

* test_router_avector_store_search_passes_correct_args

* [Feat] UI - Allow clicking into Vector Stores (#12741)

* Add View Vector Store

* add /info for vector store

* fix updated_at

* allow easily testing the KB on litellm

* fix

* rename test

* test_init_vector_store_api_endpoints

* test_update_request_data_with_litellm_managed_vector_store_registry
2025-07-18 18:18:53 -07:00
Jugal D. Bhatt a46b9d376f [Prometheus] Move Prometheus to enterprise folder (#12659)
* fix tools fetch for keys

* add promethues to enterprise

* remove old prom

* remove old prom

* fix tests

* safe imports

* add if

* fix enterprise test

* rename imports

* added label import

* added label import

* move tests to enterprise

* fix tests

* add log

* build: update versions

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
2025-07-18 11:54:47 -07:00
Ishaan Jaff 080372399f test_mock_router_testing_params_str_to_bool_conversion 2025-06-27 18:07:33 -07:00
Ishaan Jaff d8dc84ee96 test fix - router utils 2025-06-27 18:05:37 -07:00
Ishaan Jaff a42a058b0c [Bug Fix] Fix handling str, bool types formock_testing_fallbacks on router using /audio endpoints (#12117)
* use a dataclass for managing mock params

* fixes for mock testing
2025-06-27 15:11:09 -07:00
Ishaan Jaff d98a9ae424 [Fix] Router - cooldown time, allow using dynamic cooldown time for a specific deployment (#12037)
* fixes header_cooldown

* test_deployment_callback_respects_cooldown_time
2025-06-25 08:46:27 -07:00
Krish Dholakia 2654d3b0b1 Support env var vertex credentials for passthrough + ignore space id on watsonx deployment (throws Json validation errors) (#11527)
* fix(router.py): support vertex credentials set in env var for passthrough

Closes https://github.com/BerriAI/litellm/issues/11245

* fix(watsonx/common_utils.py): do not pass space_id on watsonx deployment requests - raises Json validation error

Fixes https://github.com/BerriAI/litellm/issues/10941

* test: update unit test
2025-06-07 20:31:05 -07:00
Krish Dholakia b8fe0e057f complete unified batch id support - replace model in jsonl to be deployment model name (#10719)
* feat(router.py): translate the model in jsonl for create file deployment to use the deployment model name

* test: add unit test for replace model in jsonl

* test(test_router.py): add unit tests

* test: add unit tests
2025-05-10 12:04:01 -07:00
Krish Dholakia 9bfd3e4819 fix(router.py): write file to all deployments (#10708)
* fix(router.py): write file to all deployments

allows unified file id to work across multiple deployments

* fix(view_logs/index.tsx): show call type in request logs

* fix(router.py): pass a deep copy of kwargs to avoid conflict across multiple runs

* fix(batch_utils.py): broaden check

* fix(router_utils.py): handle null type for function name

* fix(proxy_track_cost_callback.py): fix ruff check error

* fix(router.py): handle healthy_deployments as a dict

* feat(managed_files.py): support encoding / decoding unified batch id … (#10711)

* feat(managed_files.py): support encoding / decoding unified batch id when using managed files

allows routing retrieve batch to the right model id

* fix: fix linting error

* test: add unit tests

* fix: fix ruff check
2025-05-10 00:08:30 -07:00
Krish Dholakia a1964eab18 Realtime API - Set 'headers' in scope for websocket auth requests + reliability fix infinite loop when model_name not found for realtime models (#10679)
* fix(user_api_key_auth.py): add 'headers' to constructed request for websocket

Fix issue on some datastructure versions which require a headers field in scope

* test(test_user_api_key_auth.py): add unit testing for headers in scope change

* fix(router.py): migrate `_arealtime` to generic router endpoint

Fix infinite loop on model name missing for realtime api calls

* test(test_router_helper_utils.py): cleanup test post refactor
2025-05-08 22:50:09 -07:00
Ishaan Jaff 8ed3557ce7 test_init_responses_api_endpoints 2025-04-24 21:43:59 -07:00