Commit Graph

855 Commits

Author SHA1 Message Date
Ishaan Jaff bd7d653bae Revert "Update perplexity cost tracking (#15743)" (#16345)
This reverts commit ad6a0f4d44.
2025-11-06 19:00:45 -08:00
Aliaksandr Kuzmik d14a637b16 OpikLogger: fix the bug with not incorrect attachment to existing trace & refactor (#15529)
* Fix bug, add new unit test

* Extract payload builder code to a separate namespace

* Update opik.py to use logic from the new namespace

* Code cleanup, type hints improvements

* Run linter

* Log model name as span field

* Reformat arguments in payload builders

* Use dataclasses for payloads, use opik native client if it's available

* Add cost and provider

* Add provider mapping
2025-11-05 16:29:50 -08:00
Sameer Kankute ad6a0f4d44 Update perplexity cost tracking (#15743)
* Update perplexity cost tracking

* fix lint errors

* fix code

* fix tests in perplexity

* fix test realted to api call

* fix exception test
2025-11-03 08:45:34 -08:00
Krish Dholakia 74ae7aed44 build: Squashed commit of the following: (#16176)
commit bb0b050fb01633d83c1c2932f8e9c11432911847
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date:   Sat Nov 1 20:00:01 2025 -0700

    test: update tests

commit b2da4bdac23868e69a9452805b231f8830e49912
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date:   Wed Oct 22 14:58:01 2025 -0700

    fix(langfuse_otel_attributes.py): log tools and other optional params

commit 75bee1f2748f32b230467de0b085c55bf1d687a9
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date:   Wed Oct 22 14:42:05 2025 -0700

    feat(langfuse_otel/): working request/response logging on spans

    Closes https://github.com/BerriAI/litellm/issues/13764

commit a3e4fa5b81e82f71c74fb9e7dc859c6cb40495f5
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date:   Wed Oct 22 14:20:39 2025 -0700

    fix: initial commit fixing langfuse request/response logging with OTEL

commit 09fc9deac844004104822810e42975cd9c68f0e3
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date:   Wed Oct 22 13:33:52 2025 -0700

    fix(litellm_logging.py): for responses api - return a unified usage object for logging

    ensures logging integrations all pull the right usage information
2025-11-02 09:46:40 -08:00
Ishaan Jaffer 3c0d530197 test_async_vertexai_response_basic 2025-11-01 10:58:48 -07:00
Ishaan Jaffer a188e5f8e4 test_litellm_anthropic_prompt_caching_system 2025-11-01 10:51:15 -07:00
Ishaan Jaffer 2608d37e8e test prompt caching ant 2025-11-01 10:36:22 -07:00
Ishaan Jaffer b41ad66e38 ant test prompt caching 2025-11-01 10:09:01 -07:00
Ishaan Jaffer bf934c0799 test_anthropic_api_prompt_caching_basic_with_cache_creation 2025-11-01 09:21:42 -07:00
Ishaan Jaffer 2f1e947c44 test_anthropic_api_prompt_caching_basic 2025-10-31 19:32:46 -07:00
Ishaan Jaffer c918dafb32 test_router_fallbacks_with_custom_model_costs 2025-10-31 19:23:08 -07:00
Ishaan Jaffer 6e46824939 test_streaming_response 2025-10-31 19:10:38 -07:00
Ishaan Jaffer 7f79abb552 test_aastreaming_tool_calls_valid_json_str 2025-10-31 19:05:31 -07:00
Ishaan Jaffer 22eb2f8033 Revert "Python entry-point for CustomLLM subclasses (#15881)"
This reverts commit 559ae96e38.
2025-10-31 18:24:39 -07:00
Ishaan Jaffer 94c2c28f3d claude-sonnet-4-5-20250929 fix 2025-10-31 18:20:52 -07:00
Ishaan Jaffer 159db27d5c fix test claude-sonnet-4-5-20250929 2025-10-31 18:13:29 -07:00
Ishaan Jaff 99feefd614 [Feat] Add FAL AI Image Generations on LiteLLM (#16067)
* add fal-ai provider

* fix image_generation_handler

* init FalAIImageGenerationConfig

* init cost_calculator

* init FAL AI

* TestFAL_AI_ImageGeneration

* fix load_custom_provider_entrypoints

* TestFAL_AI_ImageGeneration

* add imagen4 transform FAL AI

* add FAL AI imagen 4 transform

* BaseImageGenTest

* test_fal_ai_image_generation_basic

* add BRIA + Recraft img gen

* add recraft + BRIA

* test_fal_ai_image_generation_basic

* tests for flux PRO v11

* Add FAL AI SD

* test FAL AI SD

* docs FAL AI

* docs fal ai

* Using Model-Specific Parameters

* add fal ai model prices

* add fall_ai JPG logo

* ui fixes FAL AI

* fix linting

* fix linting

* fix bedrock test_get_request_body_stability3

* test_custom_llm_provider_entrypoint
2025-10-29 13:10:51 -07:00
Albert DeFusco 559ae96e38 Python entry-point for CustomLLM subclasses (#15881)
* load entrypoints

* mock loading entry-point in pyproject.toml

* simpler group name

* create CustomLLM subclass instance after load
2025-10-28 19:39:14 -07:00
Ishaan Jaffer 33371d18f4 test fix claude-sonnet-4-5-20250929 2025-10-28 19:05:13 -07:00
Ishaan Jaffer 1b49dba1dd fix claude-sonnet-4-5 2025-10-28 17:37:08 -07:00
Ishaan Jaffer a3d64fb843 fix omni-moderation-latest 2025-10-27 17:48:35 -07:00
Ishaan Jaffer 1acc321eb3 test_router_amoderation 2025-10-27 13:50:32 -07:00
Ishaan Jaffer bbfddd00e4 test fix 2025-10-25 16:46:29 -07:00
Ishaan Jaffer a9c7fbbb60 test_router_init 2025-10-25 15:14:18 -07:00
Ishaan Jaffer 679374fe79 test_img_gen_on_router 2025-10-25 15:11:52 -07:00
Ishaan Jaffer ab0fc0a30d test_aimg_gen_on_router 2025-10-25 15:11:21 -07:00
Ishaan Jaffer 20d8345a7c test: fixes because azure deactivated our account 2025-10-25 15:10:45 -07:00
Ishaan Jaffer e878f2b1ef test_router_get_available_deployments 2025-10-25 15:09:08 -07:00
Ishaan Jaffer 5cca4c8b4f test_image_generation_openai 2025-10-25 14:57:40 -07:00
Ishaan Jaffer 2c52791b83 test_model_function_invoke 2025-10-25 14:56:12 -07:00
Ishaan Jaffer 964e683c85 test_databricks_anthropic_function_call_with_no_schema 2025-10-25 12:29:15 -07:00
Ishaan Jaffer 0bf3d1f226 test_aaaaazure_tenant_id_auth 2025-10-25 12:26:06 -07:00
Ishaan Jaffer e6b61213ca test_completion_azure_deployment_id 2025-10-25 12:26:06 -07:00
Ishaan Jaffer 762053a8e9 test_model_function_invoke 2025-10-25 12:26:06 -07:00
Ishaan Jaffer 6350c20d9f test_azure_streaming_and_function_calling 2025-10-25 12:19:26 -07:00
Ishaan Jaffer cff70ece5a test_azure_astreaming_and_function_calling 2025-10-25 12:18:53 -07:00
Ishaan Jaffer 214c10f6ef test_completion_cost_databricks_embedding 2025-10-25 11:47:03 -07:00
Ishaan Jaffer e227e8c8a0 mv test_whisper 2025-10-25 11:31:55 -07:00
Ishaan Jaffer 74106589d0 test_completion_azure_ai_gpt_4o_with_flexible_api_base 2025-10-25 11:30:51 -07:00
Ishaan Jaffer ec6c166548 _add_azure_related_dynamic_params 2025-10-25 11:11:36 -07:00
Ishaan Jaffer 0bedf1c0a7 fix tests 2025-10-25 10:19:24 -07:00
Alexsander Hamir 8c5118195d fix: replace deprecated gemini-1.5-pro-preview-0514 with gemini-2.5-flash-lite in function calling test (#15852) 2025-10-23 11:48:58 -07:00
Ishaan Jaff 73a23a6c78 [Feat] Add Azure AVA TTS integration (#15749)
* add AzureBaseIssueTokenHandler

* add BaseTextToSpeechConfig

* async_text_to_speech_handler

* add AzureAVATextToSpeechConfig

* add get_provider_text_to_speech_config

* add AzureAVATextToSpeechConfig

* fixes for base_llm_http_handler

* fix transform_text_to_speech_request

* test_azure_ava_tts_async

* test_azure_ava_tts_async

* fix TextToSpeechRequestData

* fix transform_text_to_speech_request

* add text_to_speech_handler in LLMHttpHandler

* remove old file

* fix transform_text_to_speech_request

* fix dispatch_text_to_speech

* fix azure TTS

* fix AVA TTS

* fix transform

* fix linting

* ci/cd - use one job for audio testing

* fix tests

* fix llm http handler debugging

* unit tests azure tts

* docs Azure speech

* docs fix

* docs azure AVA

* docs azure AVA

* fix handlers

* test_async_realtime_uses_max_size_parameter
2025-10-20 16:52:23 -07:00
Sameer Kankute 1fb798f81d (Bug) Fix JSON serialization error in Helicone logging by removing OpenTelemetry span from metadata (#15728)
* remove span object from helicon metadata

* Add test
2025-10-20 08:53:22 -07:00
Ishaan Jaff f55745fc5e [Fix] Forward anthropic-beta headers to Bedrock, VertexAI (#15700)
* [Fix] Forward anthropic-beta headers to Bedrock and other cross-provider scenarios (#15623)

* add_provider_specific_headers_to_request

* fix add_provider_specific_headers_to_request

* test_provider_specific_header_multi_provider

* test_provider_specific_header_in_request

---------

Co-authored-by: Jack Venberg <jack.venberg@rover.com>
2025-10-18 16:26:32 -07:00
Ishaan Jaff b1b96ff3cf [Perf] Alexsander fixes round 2 - Oct 18th (#15695)
* perf(router): Optimize prompt management model check with early exit

Add early return for models without '/' to avoid expensive get_model_list()
calls for 99% of standard model requests (gpt-4, claude-3, etc).

- Refactor _is_prompt_management_model() with "/" check before model lookup
- Add unit tests to verify optimization doesn't break detection

* perf(caching): optimize Redis batch cache operations and reduce unnecessary queries

This commit introduces several performance optimizations to the Redis caching layer:

**DualCache Improvements (dual_cache.py):**

1. Increase batch cache size limit from 100 to 1000
   - Allows for larger batch operations, reducing Redis round-trips

2. Throttle repeated Redis queries for cache misses
   - Update last_redis_batch_access_time for ALL queried keys, including those
     with None values
   - Prevents excessive Redis queries for frequently-accessed non-existent keys

3. Add early exit optimization
   - Short-circuit when redis_result is None or contains only None values
   - Avoids unnecessary processing when no cache hits are found

4. Optimize key lookup performance
   - Replace O(n) keys.index() calls with O(1) dict lookup via key_to_index mapping
   - Reduces algorithmic complexity in batch operations

5. Streamline cache updates
   - Combine result updates and in-memory cache updates in single loop
   - Only cache non-None values to avoid polluting in-memory cache

**CooldownCache Improvements (cooldown_cache.py):**

1. Enhanced early return logic
   - Check if all values in results are None, not just if results is None
   - Prevents unnecessary iteration when no valid cooldown data exists

These changes significantly improve Redis caching performance, especially for:
- High-throughput batch operations
- Scenarios with frequent cache misses
- Large-scale deployments with many concurrent requests

* fix: remove unnecessary test

* refactor: move default_max_redis_batch_cache_size to constants

- Add DEFAULT_MAX_REDIS_BATCH_CACHE_SIZE constant (default: 1000)
- Update DualCache to use constant from constants.py
- Document new environment variable in config_settings.md

* fix: only use in memory cache when set

* fix(router): improve prompt management model detection with smart early return

The previous early return optimization in _is_prompt_management_model() was
checking if the model name parameter contained '/' and returning False if it
didn't. This broke detection for model aliases (e.g., 'chatbot_actions') that
don't have '/' in their name but map to prompt management models
(e.g., 'langfuse/openai-gpt-3.5-turbo').

Changed the early return logic to only exit early when:
- Model name contains '/' AND
- The prefix is NOT a known prompt management provider

This maintains the performance optimization for 99% of direct model calls
(avoiding expensive get_model_list lookups) while correctly handling:
- Direct prompt management calls (e.g., 'langfuse/model')
- Model aliases without '/' (e.g., 'chatbot_actions')
- Regular models with/without '/' (e.g., 'gpt-3.5-turbo', 'openai/gpt-4')

Fixes test: test_router_prompt_management_factory

* perf(router): optimize _pre_call_checks with shallow copy (1400x faster)

Replace deepcopy with list() in _pre_call_checks - runs on every request.
Only pops from list, never modifies deployment dicts, so shallow copy is safe.

Performance: 1400x faster on hot path
Impact: 2-5x overall throughput improvement for routing workloads
Tests: Added regression test to ensure no mutation + filtering works

* perf(router): replace deepcopy with shallow copy for default deployment

Replace expensive copy.deepcopy() with shallow copy for default_deployment
in _common_checks_available_deployment() hot path.

Changes:
- Use dict.copy() for top-level deployment dict
- Use dict.copy() for nested litellm_params dict
- Only the 'model' field is modified, so deep recursion is unnecessary

Impact:
- 100x+ faster for default deployment path (every request when used)
- deepcopy recursively traverses entire object tree
- Shallow copy only copies two dict levels (exactly what's needed)

Test coverage:
- Added regression test to verify deployment isolation
- Ensures returned deployments don't mutate original default_deployment
- Validates multiple concurrent requests get independent copies

* perf(router): remove unnecessary dict copy in completion hot paths

Remove unnecessary deployment['litellm_params'].copy() in _completion
and _acompletion functions. The dict is only read and spread into a new
dict, never modified, making the defensive copy wasteful.

Changes:
- Remove .copy() in _completion (sync hot path)
- Remove .copy() in _acompletion (async hot path)

Impact:
- Every completion request (highest traffic endpoints)
- Eliminates unnecessary dict allocation and copy on every call
- Dict spreading already creates new dict, so no mutation possible

Test coverage:
- Added tests verifying deployment params unchanged after calls
- Tests both sync and async completion paths
- Validates optimization doesn't introduce mutations

* perf(router): optimize deployment filtering in pre-call checks

Replace O(n²) list pop pattern with O(n) set-based filtering in
_pre_call_checks() to improve routing performance under high load.

Changes:
- Use set() instead of list for invalid_model_indices tracking
- Replace reversed list.pop() loop with single-pass list comprehension
- Eliminate redundant list→set conversion overhead

Impact:
- Hot path optimization: runs on every request through the router
- ~2-5x faster filtering when many deployments fail validation
- Most beneficial with 50+ deployments per model group or high
  invalidation rates (rate limits, context window exceeded)

Technical details:
Old: O(k²) where k = invalid deployments (pop shifts remaining elements)
New: O(n) single pass with O(1) set membership checks

* add: memory profiler

feat(proxy): Add configurable GC thresholds and enhance memory debugging endpoints

- Add PYTHON_GC_THRESHOLD env var to configure garbage collection thresholds
- Add POST /debug/memory/gc/configure endpoint for runtime GC tuning
- Enhance memory debugging endpoints with better structure and explanations
- Add comprehensive router and cache memory tracking
- Include worker PID in all debug responses for multi-worker debugging

* refactor: reduce complexity in get_memory_details endpoint

Extract 6 helper functions from get_memory_details to fix linter
error PLR0915 (too many statements). Improves maintainability
while preserving functionality.

* fix(router): remove incorrect early exit in _is_prompt_management_model

Removes early exit optimization that checked model_name prefix instead
of the actual litellm_params model. This incorrectly returned False for
custom model aliases that map to prompt management providers.

Example: "my-langfuse-prompt/test_id" -> "langfuse_prompt/actual_id"

The method now correctly checks the underlying model's prefix.

Fixes test_is_prompt_management_model_optimization

* fix(proxy): add explicit type annotations to debug_utils dictionaries

Resolved 6 mypy type errors in proxy/common_utils/debug_utils.py by adding
explicit Dict[str, Any] annotations to dictionary variables where mypy was
incorrectly inferring narrow types. This allows the dictionaries to accept
different value types (strings, nested dicts) for error handling and various
return structures.

Fixed:
- Line 246: caches dictionary in get_memory_summary()
- Line 371: cache_stats dictionary in _get_cache_memory_stats()
- Line 439: litellm_router_memory dictionary in _get_router_memory_stats()

* fix(proxy): fix Python 3.8 compatibility in debug_utils type annotations

- Replace tuple[...], list[...] with Tuple[...], List[...] from typing
- Replace Dict | None with Optional[Dict] for Python 3.8 compatibility
- Add missing imports: List, Optional, Tuple to typing imports

Fixes TypeError: 'type' object is not subscriptable in Python 3.8

---------

Co-authored-by: AlexsanderHamir <alexsanderhamirgomesbaptista@gmail.com>
2025-10-18 11:12:00 -07:00
Ishaan Jaffer 732618f55f test_together_ai_embedding 2025-10-11 09:33:19 -07:00
Krish Dholakia 12a1d081ee Merge branch 'main' into litellm_dev_09_11_2025_p1 2025-10-08 19:02:58 -07:00
Copilot 4226314096 Add native Responses API support for litellm_proxy provider (#15347)
* Initial plan

* Add native Responses API support for litellm_proxy provider

Co-authored-by: ishaan-jaff <29436595+ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ishaan-jaff <29436595+ishaan-jaff@users.noreply.github.com>
2025-10-08 18:31:26 -07:00
Ishaan Jaff 2f42c806cb [Fix] x-litellm-cache-key header not being returned on cache hit (#15348)
* fix: x-cache-key

* test_cache_key_in_hidden_params_acompletion

* fix: remove_cache_control_flag_from_messages_and_tools
2025-10-08 18:10:43 -07:00