Commit Graph

41 Commits

Author SHA1 Message Date
Cesar Garcia 5e70c78b94 fix(cost-tracking): support base_model lookup in litellm_metadata for Responses API (#16778)
Cost tracking was failing for Responses API when using custom deployment names
with base_model configuration. The issue occurred because:

- Chat Completions API stores model_info in 'metadata'
- Responses API stores model_info in 'litellm_metadata'
- Cost calculator only checked 'metadata', missing Responses API costs

Changes:
- Updated _get_base_model_from_metadata() to check both metadata locations
- Added comprehensive unit tests covering all scenarios
- Maintains backward compatibility (metadata takes precedence)

Fixes #16772
2025-11-18 19:53:18 -08:00
Ishaan Jaffer 95b1608970 test_get_valid_models_with_custom_llm_provider 2025-11-15 09:43:10 -08:00
Ishaan Jaffer 94c2c28f3d claude-sonnet-4-5-20250929 fix 2025-10-31 18:20:52 -07:00
Ishaan Jaffer ae7b13550e test_models_by_provider 2025-10-23 09:10:41 -07:00
Ishaan Jaffer 8cb66168bc test fix 2025-10-10 19:57:17 -07:00
Georg Wölflein dbfa8ec921 Fix end user cost tracking in the responses API (#15124)
#13860
2025-10-02 15:13:57 -07:00
Krish Dholakia d4540d31c1 Merge branch 'main' into fix/streaming-tool-call-indices 2025-09-21 21:24:22 -07:00
Ishaan Jaffer c6afa904bb fix: test_completion_with_no_model 2025-09-18 10:17:09 -07:00
Ishaan Jaffer 1e1d174733 fix: test_completion_with_no_model 2025-09-18 10:13:32 -07:00
Tim Elfrink c5ca2afec3 Add test for tool call sequential index assignment
- Test multiple tool calls without explicit indices receive sequential indices
- Verify Delta class assigns indices 0, 1, 2... instead of defaulting all to 0
- Add comprehensive assertions for tool call details preservation
- Cover provider-agnostic streaming response scenarios
2025-09-15 21:11:13 +02:00
Krrish Dholakia d05f58721e test: remove end of life model from tests 2025-09-09 21:01:45 -07:00
Ishaan Jaff d37be48a80 test: llama-3.3-70b-versatile 2025-09-01 20:14:12 -07:00
Krish Dholakia 3e764ec268 Merge pull request #13808 from mainred/validate_api_version
feat(utils.py): accept 'api_version' as param for validate_environment
2025-08-22 23:59:38 -07:00
Ishaan Jaff e93e266f84 [Performance] Use O(1) Set lookups for model routing (#13879)
* o(1) lookups

* Revert "o(1) lookups"

This reverts commit 620d14246980813366b4b1f1c0ce396b528dd9df.

* o(1) lookups

* Revert "o(1) lookups"

This reverts commit 676a9f5bcc3c2b9fa31e0a9fdf00389739b3052f.

* o(1) lookups

* register_model fix

* test_aget_valid_models

* lambda ai models fix

* test_utils.py

* test fix vertex ai
2025-08-21 22:56:46 -07:00
Qingchuan Hao f2a6be390b feat(utils.py): accept 'api_version' as param for validate_environment 2025-08-20 14:29:58 +00:00
Krrish Dholakia f544a4e238 test: update test 2025-07-29 21:08:36 -07:00
Robert Gambee 52b2984792 [Bug Fix] Always include tool calls in output of trim_messages (#11517)
* Check content and order of trimmed messages

* Assert tool calls are preserved if below max_tokens

* Unreverse order of tool calls

* Return tool calls alongside other messages

* Write test for trimming untokenizable field

* Return original messages in case of exception
2025-07-17 16:01:59 -07:00
Ishaan Jaff c31a7d3ab7 fix new utils tests 2025-07-04 18:30:50 -07:00
Ishaan Jaff 6b623f9c98 test whitelisted models 2025-06-28 14:46:16 -07:00
Bougou Nisou 58dda44fda feat: enhance redaction functionality for EmbeddingResponse (#12088) 2025-06-27 21:30:26 -07:00
Ishaan Jaff 7d47417906 test: fixes 2025-05-31 12:42:56 -07:00
Ishaan Jaff 0590b1eb3a [Fix] Prometheus Metrics - Do not track end_user by default + expose flag to enable tracking end_user on prometheus (#11192)
* fix: testing for disabling end user on metrics

* fix: fixes for test_prometheus_factory

* Delete litellm/model_prices_and_context_window_backup.json

* fix: issues with merge conflicts

* fix: test_get_end_user_id_for_cost_tracking_prometheus_only

* Update tests/test_litellm/integrations/test_prometheus.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-27 17:06:58 -07:00
Ishaan Jaff 86cdb8382b [Feat] Use aiohttp transport by default - 97% lower median latency (#11097)
* fix: add flag for disabling use_aiohttp_transport

* feat: add _create_async_transport

* feat: fixes for transport

* add httpx-aiohttp

* feat: fixes for transport

* refactor: fixes for transport

* build: fix deps

* fixes: test fixes

* fix: ensure aiohttp does not auto set content type

* test: test fixes

* feat: add LiteLLMAiohttpTransport

* fix: fixes for responses API handling

* test: fixes for responses API handling

* test: fixes for responses API handling

* feat: fixes for transport

* fix: base embedding handler

* test: test_async_http_handler_force_ipv4

* test: fix failing deepeval test

* fix: add YARL for bedrock urls

* fix: issues with transport

* fix: comment out linting issues

* test fix

* test: XAI is unstable

* test: fixes for using respx

* test: XAI fixes

* test: XAI fixes

* test: infinity testing fixes

* docs(config_settings.md): document param

* test: test_openai_image_edit_litellm_sdk

* test: remove deprecated test

* bump respx==0.22.0

* test: test_xai_message_name_filtering

* test: fix anthropic test after bumping httpx

* use n 4 for mapped tests (#11109)

* fix: use 1 session per event loop

* test: test_client_session_helper

* fix: linting error

* fix: resolving GET requests on httpx 0.28.1

* test fixes proxy unit tests

* fix: add ssl verify settings

* fix: proxy unit tests

* fix: refactor

* tests: basic unit tests for aiohttp transports

* tests: fixes xai

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
2025-05-23 22:55:35 -07:00
Krrish Dholakia 33814917fe fix(token_counter.py): handle empty lists 2025-05-02 08:07:08 -07:00
Carlos Freund cb177dbd7a Fix and rewrite of token_counter (#10409)
* added tests

messages_with_counts: Made tolerance explicit for each test. But they match the new implementation(which beats the old)

* new token counter impl

* compare old and new implementation in test

* delete old token counter

* moved tests to /tests/litellm/litellm_core_utils

* use existing types

* docstrings

* warn about using default params on unknown model.

* created type for the token_counter_function

* check key == "content"

* throw error on invalid detail-type, ignore type-warning.

* fix imports
2025-05-01 23:34:37 -07:00
Ruperto A. Martinez 298a3574f4 Add supports_pdf_input: true to Claude 3.7 bedrock models (#9917)
* Add supports_pdf_input: true to Claude 3.7 bedrock models

* update unit test

---------

Co-authored-by: RupertoXTI <rmartinez@xtillion.com>
2025-05-01 14:56:54 -07:00
Krish Dholakia 33ead69c0a Support checking provider /models endpoints on proxy /v1/models endpoint (#9958)
* feat(utils.py): support global flag for 'check_provider_endpoints'

enables setting this for `/models` on proxy

* feat(utils.py): add caching to 'get_valid_models'

Prevents checking endpoint repeatedly

* fix(utils.py): ensure mutations don't impact cached results

* test(test_utils.py): add unit test to confirm cache invalidation logic

* feat(utils.py): get_valid_models - support passing litellm params dynamically

Allows for checking endpoints based on received credentials

* test: update test

* feat(model_checks.py): pass router credentials to get_valid_models - ensures it checks correct credentials

* refactor(utils.py): refactor for simpler functions

* fix: fix linting errors

* fix(utils.py): fix test

* fix(utils.py): set valid providers to custom_llm_provider, if given

* test: update test

* fix: fix ruff check error
2025-04-14 23:23:20 -07:00
Ishaan Jaff f9ce754817 [Feat] Add litellm.supports_reasoning() util to track if an llm supports reasoning (#9923)
* add supports_reasoning for xai models

* add "supports_reasoning": true for o1 series models

* add supports_reasoning util

* add litellm.supports_reasoning

* add supports reasoning for claude 3-7 models

* add deepseek as supports reasoning

* test_supports_reasoning

* add supports reasoning to model group info

* add supports_reasoning

* docs supports reasoning

* fix supports_reasoning test

* "supports_reasoning": false,

* fix test

* supports_reasoning
2025-04-11 17:56:04 -07:00
Krish Dholakia ccbac691e5 Support discovering gemini, anthropic, xai models by calling their /v1/model endpoint (#9530)
* fix: initial commit for adding provider model discovery to gemini

* feat(gemini/): add model discovery for gemini/ route

* docs(set_keys.md): update docs to show you can check available gemini models as well

* feat(anthropic/): add model discovery for anthropic api key

* feat(xai/): add model discovery for XAI

enables checking what models an xai key can call

* ci: bump ci config yml

* fix(topaz/common_utils.py): fix linting error

* fix: fix linting error for python38
2025-03-27 22:50:48 -07:00
Krish Dholakia c0845fec1f Add OpenAI gpt-4o-transcribe support (#9517)
* refactor: introduce new transformation config for gpt-4o-transcribe models

* refactor: expose new transformation configs for audio transcription

* ci: fix config yml

* feat(openai/transcriptions): support provider config transformation on openai audio transcriptions

allows gpt-4o and whisper audio transformation to work as expected

* refactor: migrate fireworks ai + deepgram to new transform request pattern

* feat(openai/): working support for gpt-4o-audio-transcribe

* build(model_prices_and_context_window.json): add gpt-4o-transcribe to model cost map

* build(model_prices_and_context_window.json): specify what endpoints are supported for `/audio/transcriptions`

* fix(get_supported_openai_params.py): fix return

* refactor(deepgram/): migrate unit test to deepgram handler

* refactor: cleanup unused imports

* fix(get_supported_openai_params.py): fix linting error

* test: update test
2025-03-26 23:10:25 -07:00
Ishaan Jaff 1d7accce9e test_supports_web_search 2025-03-22 13:49:35 -07:00
Krrish Dholakia 8ef9129556 fix(types/utils.py): support openai 'file' message type
Closes https://github.com/BerriAI/litellm/issues/9365
2025-03-19 23:13:51 -07:00
Krish Dholakia ab7c4d1a0e Litellm dev bedrock anthropic 3 7 v2 (#8843)
* feat(bedrock/converse/transformation.py): support claude-3-7-sonnet reasoning_Content transformation

Closes https://github.com/BerriAI/litellm/issues/8777

* fix(bedrock/): support returning `reasoning_content` on streaming for claude-3-7

Resolves https://github.com/BerriAI/litellm/issues/8777

* feat(bedrock/): unify converse reasoning content blocks for consistency across anthropic and bedrock

* fix(anthropic/chat/transformation.py): handle deepseek-style 'reasoning_content' extraction within transformation.py

simpler logic

* feat(bedrock/): fix streaming to return blocks in consistent format

* fix: fix linting error

* test: fix test

* feat(factory.py): fix bedrock thinking block translation on tool calling

allows passing the thinking blocks back to bedrock for tool calling

* fix(types/utils.py): don't exclude provider_specific_fields on model dump

ensures consistent responses

* fix: fix linting errors

* fix(convert_dict_to_response.py): pass reasoning_content on root

* fix: test

* fix(streaming_handler.py): add helper util for setting model id

* fix(streaming_handler.py): fix setting model id on model response stream chunk

* fix(streaming_handler.py): fix linting error

* fix(streaming_handler.py): fix linting error

* fix(types/utils.py): add provider_specific_fields to model stream response

* fix(streaming_handler.py): copy provider specific fields and add them to the root of the streaming response

* fix(streaming_handler.py): fix check

* fix: fix test

* fix(types/utils.py): ensure messages content is always openai compatible

* fix(types/utils.py): fix delta object to always be openai compatible

only introduce new params if variable exists

* test: fix bedrock nova tests

* test: skip flaky test

* test: skip flaky test in ci/cd
2025-02-26 16:05:33 -08:00
Krish Dholakia 09462ba80c Add cohere v2/rerank support (#8421) (#8605)
* Add cohere v2/rerank support (#8421)

* Support v2 endpoint cohere rerank

* Add tests and docs

* Make v1 default if old params used

* Update docs

* Update docs pt 2

* Update tests

* Add e2e test

* Clean up code

* Use inheritence for new config

* Fix linting issues (#8608)

* Fix cohere v2 failing test + linting (#8672)

* Fix test and unused imports

* Fix tests

* fix: fix linting errors

* test: handle tgai instability

* fix: skip service unavailable err

* test: print logs for unstable test

* test: skip unreliable tests

---------

Co-authored-by: vibhavbhat <vibhavb00@gmail.com>
2025-02-22 22:25:29 -08:00
Krish Dholakia 251467a525 add bedrock llama vision support + cohere / infinity rerank - 'return_documents' support (#8684)
* build(model_prices_and_context_window.json): mark bedrock llama as supporting vision based on docs

* Add price for Cerebras llama3.3-70b (#8676)

* docs(readme.md): fix contributing docs

point people to new mock directory testing structure s/o @vibhavbhat

* build: update contributing readme

* docs(readme.md): improve docs

* docs(readme.md): cleanup readme on tests/

* docs(README.md): cleanup doc

* feat(infinity/): support returning documents when return_documents=True

* test(test_rerank.py): add e2e testing for cohere rerank

* fix: fix linting errors

* fix(together_ai/): fix together ai transformation

* fix: fix linting error

* fix: fix linting errors

* fix: fix linting errors

* test: mark cohere as flaky

* build: fix model supports check

* test: fix test

* test: mark flaky test

* fix: fix test

* test: fix test

---------

Co-authored-by: Yury Koleda <fut.wrk@gmail.com>
2025-02-20 21:23:54 -08:00
Krish Dholakia 2340f1b31f Pass router tags in request headers - x-litellm-tags (#8609)
* feat(litellm_pre_call_utils.py): support `x-litellm-tags` request header

allow tag based routing + spend tracking via request headers

* docs(request_headers.md): document new `x-litellm-tags` for tag based routing and spend tracking

* docs(tag_routing.md): add to docs

* fix(utils.py): only pass str values for openai metadata param

* fix(utils.py): drop non-str values for metadata param to openai

preview-feature, otel span was being sent in
2025-02-18 08:26:22 -08:00
Ishaan Jaff 2753de1458 (Bug Fix + Better Observability) - BudgetResetJob: (#8562)
* use class ResetBudgetJob

* refactor reset budget job

* update reset_budget job

* refactor reset budget job

* fix LiteLLM_UserTable

* refactor reset budget job

* add telemetry for reset budget job

* dd - log service success/failure on DD

* add detailed reset budget reset info on DD

* initialize_scheduled_background_jobs

* refactor reset budget job

* trigger service failure hook when fails to reset a budget for team, key, user

* fix resetBudgetJob

* unit testing for ResetBudgetJob

* test_duration_in_seconds_basic

* testing for triggering service logging

* fix logs on test teams fail

* remove unused imports

* fix import duration in s

* duration_in_seconds
2025-02-15 16:13:08 -08:00
Krish Dholakia ce3ead6f91 Log applied guardrails on LLM API call (#8452)
* fix(litellm_logging.py): support saving applied guardrails in logging object

allows list of applied guardrails to be logged for proxy admin's knowledge

* feat(spend_tracking_utils.py): log applied guardrails to spend logs

makes it easy for admin to know what guardrails were applied on a request

* ci(config.yml): uninstall posthog from ci/cd

* test: fix tests

* test: update test
2025-02-10 22:57:30 -08:00
Krish Dholakia b5850b6b65 Handle azure deepseek reasoning response (#8288) (#8366)
* Handle azure deepseek reasoning response (#8288)

* Handle deepseek reasoning response

* Add helper method + unit test

* Fix: Follow infinity api url format (#8346)

* Follow infinity api url format

* Update test_infinity.py

* fix(infinity/transformation.py): fix linting error

---------

Co-authored-by: vibhavbhat <vibhavb00@gmail.com>
Co-authored-by: Hao Shan <53949959+haoshan98@users.noreply.github.com>
2025-02-07 17:45:51 -08:00
Krish Dholakia f031926b82 fix(utils.py): handle key error in msg validation (#8325)
* fix(utils.py): handle key error in msg validation

* Support running Aim Guard during LLM call (#7918)

* support running Aim Guard during LLM call

* Rename header

* adjust docs and fix type annotations

* fix(timeout.md): doc fix for openai example on dynamic timeouts

---------

Co-authored-by: Tomer Bin <117278227+hxtomer@users.noreply.github.com>
2025-02-06 18:13:46 -08:00
Ishaan Jaff b812286534 (fix) - proxy reliability, ensure duplicate callbacks are not added to proxy (#8067)
* refactor _add_callbacks_from_db_config

* fix check for _custom_logger_exists_in_litellm_callbacks

* move loc of test utils

* run ci/cd again

* test_add_custom_logger_callback_to_specific_event_with_duplicates_callbacks

* fix _custom_logger_class_exists_in_success_callbacks

* unit testing for test_add_callbacks_from_db_config

* test_custom_logger_exists_in_callbacks_individual_functions

* fix config.yml

* fix test test_stream_chunk_builder_openai_audio_output_usage - use direct dict comparison
2025-01-28 21:01:56 -08:00