Commit Graph

63 Commits

Author SHA1 Message Date
yuneng-jiang b08f464ee8 fix(tests): replace deprecated model refs in cost and model_info tests
Models removed from pricing JSON:
- gemini-1.5-pro-002, gemini-1.5-flash, gemini-1.5-flash-latest -> gemini-2.0-flash
- gpt-4o-audio-preview-2024-10-01 -> gpt-4o-audio-preview
- Tests using per-character pricing updated to per-token (no gemini models have per-character pricing now)
- Removed above_128k parametrization (no gemini models have tiered 128k pricing now)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:39:35 -07:00
yuneng-jiang 002d64b321 fix(tests): increase MAX_CALLS and reduce sleep in flaky e2e budget test
The test_chat_completion_low_budget test was flaky because async spend
tracking couldn't reliably catch up within 50 calls with 0.5s sleeps.
Increased to 200 calls with 0.1s sleeps (same total time budget) to
give more opportunities for budget enforcement to trigger.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:04:31 -07:00
yuneng-jiang 124b44ec22 fix(tests): update PKCE SSO tests to mock get_async_httpx_client
The recent commit 2a997993d4 replaced httpx.AsyncClient() with
get_async_httpx_client() in ui_sso.py, but the PKCE tests still
patched the old httpx.AsyncClient path. Updated all 10 affected
tests to mock get_async_httpx_client and removed unnecessary
context manager setup since AsyncHTTPHandler is returned directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:02:12 -07:00
yuneng-jiang 5dab326d0c fix(tests): update deprecated model refs in test_completion_cost
Replace models removed from pricing JSON during deprecation cleanup:
- textembedding-gecko -> text-embedding-004
- gemini-1.5-flash -> gemini-2.0-flash

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:01:00 -07:00
yuneng-jiang 82de82f1b6 Fix test_completion_cost_prompt_caching gemini parametrization
gemini/gemini-2.5-flash lacks cache_creation_input_token_cost in the
model cost map, causing a TypeError when the test multiplies
cache_creation_input_tokens by None. Use claude-haiku-4-5 instead,
which has the required prompt caching cost fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:12:15 -07:00
yuneng-jiang c9f7075690 Replace additional deprecated models across test files
- tests/local_testing/test_completion_cost.py:
  - claude-3-5-sonnet-20240620 -> claude-sonnet-4-6
  - gemini/gemini-1.5-flash-001 -> gemini/gemini-2.5-flash

- tests/test_litellm/test_utils.py:
  - claude-3-5-sonnet-20240620 -> claude-sonnet-4-6 (VertexAI config test, proxy tests)
  - gemini-1.5-pro -> gemini-2.5-pro (pre_process_non_default_params)
  - gemini/gemini-1.5-pro -> gemini/gemini-2.5-pro (proxy tests)

- tests/litellm_utils_tests/test_utils.py:
  - claude-3-opus-20240229 -> claude-sonnet-4-6 (trimming, vision tests)
  - gemini-pro -> gemini-2.5-pro (function calling test)
  - gemini-pro-vision -> gemini-2.5-flash (vision test)
  - gemini-1.5-pro -> gemini-2.5-pro (response schema test)
  - gemini/gemini-1.5-flash -> gemini/gemini-2.5-flash (function calling test)
  - gemini-1.5-pro -> gemini-2.5-pro (vision gemini test)
  - gpt-4-vision-preview -> gpt-4o (vision test)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:03:54 -07:00
Alexsander Hamir 5534038e93 Fix CI: Revert security scan changes and add GitGuardian ignore rules (#18358) 2025-12-22 17:03:53 -08:00
Ishaan Jaffer 6112160a16 Revert "[Fix] Security - Remove example API keys with high entropy (#18255)"
This reverts commit 24edbccf5c.
2025-12-20 20:48:11 +05:30
Alexsander Hamir 24edbccf5c [Fix] Security - Remove example API keys with high entropy (#18255) 2025-12-19 10:09:50 -08:00
Ishaan Jaffer 01fd4d7cef fix fireworks test 2025-11-26 18:58:32 -08:00
Krish Dholakia 06906534b3 feat(audio_transcriptions/): calculate duration of audio file for cost calculation + feat (image_generations): cost tracking accuracy improved with output_format, quality, size values fixed per openai model
* feat(audio_transcriptions/): calculate duration of audio file for cost calculation

Fixes https://github.com/BerriAI/litellm/issues/11846

Closes https://github.com/BerriAI/litellm/issues/14605

* fix(cost_calculator.py): correctly use base model, when set

Fixes issue where azure base model was being ignored

* feat(cost_calculator.py): fix default cost tracking quality param for image generation

* feat(image_generations/): return output_format, quality, size

aligns response to openai spec and improves cost tracking accuracy

* fix(cost_calculator.py): refactor cost calculation for image generation to use image response instead of hidden params

* build: update build

* fix: fix cost calculation

* build: update poetry lock

* fix: fix ruff checks

* fix: fix aembedding

* fix: fix ruff errors

* fix: modify to catch errors

* fix: test

* fix: loosen test to handle openai lib out of sync

* fix: fix base models

* fix: fix usage object
2025-11-08 16:24:31 -08:00
Krish Dholakia 202eaeb1a2 Revert "(feat) Audio transcription - cost tracking + (feat) image generation …" (#16409)
This reverts commit c96da44265.
2025-11-08 15:38:16 -08:00
Krish Dholakia c96da44265 (feat) Audio transcription - cost tracking + (feat) image generation - accurate cost tracking based on output_format/quality/size
* feat(audio_transcriptions/): calculate duration of audio file for cost calculation

Fixes https://github.com/BerriAI/litellm/issues/11846

Closes https://github.com/BerriAI/litellm/issues/14605

* fix(cost_calculator.py): correctly use base model, when set

Fixes issue where azure base model was being ignored

* feat(cost_calculator.py): fix default cost tracking quality param for image generation

* feat(image_generations/): return output_format, quality, size

aligns response to openai spec and improves cost tracking accuracy

* fix(cost_calculator.py): refactor cost calculation for image generation to use image response instead of hidden params

* build: update build

* fix: fix cost calculation

* build: update poetry lock

* fix: fix ruff checks

* fix: fix aembedding

* fix: fix ruff errors

* fix: modify to catch errors

* fix: test

* fix: loosen test to handle openai lib out of sync
2025-11-08 15:30:46 -08:00
Ishaan Jaffer 214c10f6ef test_completion_cost_databricks_embedding 2025-10-25 11:47:03 -07:00
Ishaan Jaffer 6aa35ec999 test text-embedding-ada-002 2025-09-27 12:41:35 -07:00
Krrish Dholakia 0e747aaaf1 test: fix test 2025-09-16 19:20:12 -07:00
Krrish Dholakia d05f58721e test: remove end of life model from tests 2025-09-09 21:01:45 -07:00
Ishaan Jaff d37be48a80 test: llama-3.3-70b-versatile 2025-09-01 20:14:12 -07:00
Jugal D. Bhatt aea0605eed [LLM Translation] Fix Realtime API endpoint for no intent (#13476)
* fix intent params

* Add responses

* fix unrelated test

* test fix - fireworks API endpoint is down

* test fix fireworks ai is having an active outage

* test_completion_cost_databricks

* dbrx fix test API currently not responding

* Update OpenAI Realtime handler to use the correct endpoint and include all query parameters. Adjusted error messages for missing API base and key. Updated health check URL construction to pass model as a query parameter.

* Enhance OpenAI Realtime handler tests to ensure model parameter inclusion in WebSocket URL. Added new tests to verify correct URL construction with model and additional parameters, preventing 'missing_model' errors. Updated existing tests for consistency.

* Remove debug print statements for API base and key in OpenAIRealtime handler to clean up the code.

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2025-08-14 16:24:14 -07:00
Ishaan Jaff 461cd0c30a test_completion_cost_deepseek 2025-07-23 13:16:12 -07:00
Krrish Dholakia 4ab0ee0b65 test: more testing fixes 2025-05-01 15:36:13 -07:00
Krish Dholakia 6ad483dde7 Litellm dev 04 30 2025 p1 (#10462)
* fix(exception_mapping_utils.py): correctly pass through 504 status code

openai also raises a 504 status code

* build(model_prices_and_context_window.json): add gpt-4o-mini-tts to model cost map

Fixes https://github.com/BerriAI/litellm/issues/9591

* fix(cost_calculator.py): fix input cost calculation for gpt-4o-mini-tts

Fixes https://github.com/BerriAI/litellm/issues/9591

* test: testing updates
2025-04-30 22:11:12 -07:00
Krish Dholakia d783190e04 Update fireworks ai pricing (#10425)
* build(model_prices_and_context_window.json): add fireworks ai new 0-4b pricing tier

* build(model_prices_and_context_window.json): add more fireworks ai models

* test: update testing

* test: testing updates

* test: update test

* test: update test
2025-04-29 20:58:05 -07:00
Krrish Dholakia 652e1b7f0f test: update test 2025-04-18 20:36:15 -07:00
Krrish Dholakia 3e87ec4f16 test: replace removed fireworks ai models 2025-04-18 14:23:16 -07:00
Ishaan Jaff 65f8015221 test fix - azure deprecated azure ai mistral 2025-04-15 21:08:55 -07:00
Krish Dholakia 0dbd663877 fix(cost_calculator.py): handle custom pricing at deployment level fo… (#9855)
* fix(cost_calculator.py): handle custom pricing at deployment level for router

* test: add unit tests

* fix(router.py): show custom pricing on UI

check correct model str

* fix: fix linting error

* docs(custom_pricing.md): clarify custom pricing for proxy

Fixes https://github.com/BerriAI/litellm/issues/8573#issuecomment-2790420740

* test: update code qa test

* fix: cleanup traceback

* fix: handle litellm param custom pricing

* test: update test

* fix(cost_calculator.py): add router model id to list of potential model names

* fix(cost_calculator.py): fix router model id check

* fix: router.py - maintain older model registry approach

* fix: fix ruff check

* fix(router.py): router get deployment info

add custom values to mapped dict

* test: update test

* fix(utils.py): update only if value is non-null

* test: add unit test
2025-04-09 22:13:10 -07:00
Ishaan Jaff d8f47fc9e5 databricks/databricks-meta-llama-3-3-70b-instruct 2025-04-07 20:16:24 -07:00
Ishaan Jaff 7262606411 test_completion_cost_databricks 2025-04-05 13:30:17 -07:00
Ishaan Jaff d87bb9bb6e test_completion_cost_databricks 2025-04-05 13:13:25 -07:00
Ishaan Jaff 1638872762 databricks/databricks-meta-llama-3.3-70b-instruct" 2025-04-05 13:12:21 -07:00
Krish Dholakia 722f3ff0e6 fix(cost_calculator.py): allows checking received + sent model name when checking for cost calculation (#9669)
Fixes issue introduced by https://github.com/BerriAI/litellm/commit/dfb838eaff82301d4101d09982fbbb251bbc1ce1#r154667517
2025-03-31 21:29:48 -07:00
Krish Dholakia 4351c77253 Support Gemini audio token cost tracking + fix openai audio input token cost tracking (#9535)
* fix(vertex_and_google_ai_studio_gemini.py): log gemini audio tokens in usage object

enables accurate cost tracking

* refactor(vertex_ai/cost_calculator.py): refactor 128k+ token cost calculation to only run if model info has it

Google has moved away from this for gemini-2.0 models

* refactor(vertex_ai/cost_calculator.py): migrate to usage object for more flexible data passthrough

* fix(llm_cost_calc/utils.py): support audio token cost tracking in generic cost per token

enables vertex ai cost tracking to work with audio tokens

* fix(llm_cost_calc/utils.py): default to total prompt tokens if text tokens field not set

* refactor(llm_cost_calc/utils.py): move openai cost tracking to generic cost per token

more consistent behaviour across providers

* test: add unit test for gemini audio token cost calculation

* ci: bump ci config

* test: fix test
2025-03-26 17:26:25 -07:00
Krrish Dholakia e2ae504a81 test: skip flaky tests 2025-03-11 19:43:04 -07:00
Ishaan Jaff 6e3b21775f test_cost_azure_openai_prompt_caching 2025-03-08 16:19:28 -08:00
Krish Dholakia 5e386c28b2 Litellm dev 03 04 2025 p3 (#8997)
* fix(core_helpers.py): handle litellm_metadata instead of 'metadata'

* feat(batches/): ensure batches logs are written to db

makes batches response dict compatible

* fix(cost_calculator.py): handle batch response being a dictionary

* fix(batches/main.py): modify retrieve endpoints to use @client decorator

enables logging to work on retrieve call

* fix(batches/main.py): fix retrieve batch response type to be 'dict' compatible

* fix(spend_tracking_utils.py): send unique uuid for retrieve batch call type

create batch and retrieve batch share the same id

* fix(spend_tracking_utils.py): prevent duplicate retrieve batch calls from being double counted

* refactor(batches/): refactor cost tracking for batches - do it on retrieve, and within the established litellm_logging pipeline

ensures cost is always logged to db

* fix: fix linting errors

* fix: fix linting error
2025-03-04 21:58:03 -08:00
Krish Dholakia 251467a525 add bedrock llama vision support + cohere / infinity rerank - 'return_documents' support (#8684)
* build(model_prices_and_context_window.json): mark bedrock llama as supporting vision based on docs

* Add price for Cerebras llama3.3-70b (#8676)

* docs(readme.md): fix contributing docs

point people to new mock directory testing structure s/o @vibhavbhat

* build: update contributing readme

* docs(readme.md): improve docs

* docs(readme.md): cleanup readme on tests/

* docs(README.md): cleanup doc

* feat(infinity/): support returning documents when return_documents=True

* test(test_rerank.py): add e2e testing for cohere rerank

* fix: fix linting errors

* fix(together_ai/): fix together ai transformation

* fix: fix linting error

* fix: fix linting errors

* fix: fix linting errors

* test: mark cohere as flaky

* build: fix model supports check

* test: fix test

* test: mark flaky test

* fix: fix test

* test: fix test

---------

Co-authored-by: Yury Koleda <fut.wrk@gmail.com>
2025-02-20 21:23:54 -08:00
Krish Dholakia b682dc4ec8 Add cost tracking for rerank via bedrock (#8691)
* feat(bedrock/rerank): infer model region if model given as arn

* test: add unit testing to ensure bedrock region name inferred from arn on rerank

* feat(bedrock/rerank/transformation.py): include search units for bedrock rerank result

Resolves https://github.com/BerriAI/litellm/issues/7258#issuecomment-2671557137

* test(test_bedrock_completion.py): add testing for bedrock cohere rerank

* feat(cost_calculator.py): refactor rerank cost tracking to support bedrock cost tracking

* build(model_prices_and_context_window.json): add amazon.rerank model to model cost map

* fix(cost_calculator.py): bedrock/common_utils.py

get base model from model w/ arn -> handles rerank model

* build(model_prices_and_context_window.json): add bedrock cohere rerank pricing

* feat(bedrock/rerank): migrate bedrock config to basererank config

* Revert "feat(bedrock/rerank): migrate bedrock config to basererank config"

This reverts commit 84fae1f1679a209a3e9cdcea593ed683fdb96acc.

* test: add testing to ensure large doc / queries are correctly counted

* Revert "test: add testing to ensure large doc / queries are correctly counted"

This reverts commit 4337f1657e13a6d35527a400e3be17c11d4b662b.

* fix(migrate-jina-ai-to-rerank-config): enables cost tracking

* refactor(jina_ai/): finish migrating jina ai to base rerank config

enables cost tracking

* fix(jina_ai/rerank): e2e jina ai rerank cost tracking

* fix: cleanup dead code

* fix: fix python3.8 compatibility error

* test: fix test

* test: add e2e testing for azure ai rerank

* fix: fix linting error

* test: mark cohere as flaky
2025-02-20 21:00:18 -08:00
Krish Dholakia ea985dda0b fix(model_cost_map): fix json parse error on model cost map + add unit test (#8629)
Fixes https://github.com/BerriAI/litellm/pull/8619#issuecomment-2666693045
2025-02-18 11:18:16 -08:00
Krish Dholakia 9c20c69915 Fix bedrock model pricing + add unit test using bedrock pricing api (#7978)
* test(test_completion_cost.py): add unit testing to ensure all bedrock models with region name have cost tracked

* feat: initial script to get bedrock pricing from amazon api

ensures bedrock pricing is accurate

* build(model_prices_and_context_window.json): correct bedrock model prices based on api check

ensures accurate bedrock pricing

* ci(config.yml): add bedrock pricing check to ci/cd

ensures litellm always maintains up-to-date pricing for bedrock models

* ci(config.yml): add beautiful soup to ci/cd

* test: bump groq model

* test: fix test
2025-01-28 17:57:49 -08:00
Krish Dholakia 03eef5a2a0 Fix custom pricing - separate provider info from model info (#7990)
* fix(utils.py): initial commit fixing custom cost tracking

refactors out provider specific model info from `get_model_info` - this was causing custom costs to be registered incorrectly

* fix(utils.py): cleanup `_supports_factory` to check provider info, if model info is None

some providers support features like vision across all models

* fix(utils.py): refactor to use _supports_factory

* test: update testing

* fix: fix linting errors

* test: fix testing
2025-01-25 21:49:28 -08:00
Krish Dholakia 8ca3229b26 Ensure base_model cost tracking works across all endpoints (#7989)
* test(test_completion_cost.py): add sdk test to ensure base model is used for cost tracking

* test(test_completion_cost.py): add sdk test to ensure custom pricing works

* fix(main.py): add base model cost tracking support for embedding calls

Enables base model cost tracking for embedding calls when base model set as a litellm_param

* fix(litellm_logging.py): update logging object with litellm params - including base model, if given

ensures base model param is always tracked

* fix(main.py): fix linting errors
2025-01-24 21:05:26 -08:00
Krish Dholakia c6e9240405 Add datadog health check support + fix bedrock converse cost tracking w/ region name specified (#7958)
* fix(bedrock/converse_handler.py): fix bedrock region name on async calls

* fix(utils.py): fix split model handling

Fixes bedrock cost calculation when region name is given

* feat(_health_endpoints.py): support health checking datadog integration

Closes https://github.com/BerriAI/litellm/issues/7921
2025-01-23 22:17:09 -08:00
Krish Dholakia becd4bc748 Litellm dev 01 11 2025 p3 (#7702)
* fix(__init__.py): fix init to exclude pricing-only model cost values from real model names

prevents bad health checks on wildcard routes

* fix(get_llm_provider.py): fix to handle calling bedrock_converse models
2025-01-11 20:06:54 -08:00
Krish Dholakia 4af23353d6 Allow assigning teams to org on UI + OpenAI omni-moderation cost model tracking (#7566)
* feat(cost_calculator.py): add cost tracking ($0) for openai moderations endpoint

removes sentry cost tracking errors caused by this

* build(teams.tsx): allow assigning teams to orgs
2025-01-08 16:58:21 -08:00
Krish Dholakia 4e69711411 Litellm dev 01 07 2025 p1 (#7618)
* fix(main.py): pass custom llm provider on litellm logging provider update

* fix(cost_calculator.py): don't append provider name to return model if existing llm provider

Fixes https://github.com/BerriAI/litellm/issues/7607

* fix(prometheus_services.py): fix prometheus system health error logging

Fixes https://github.com/BerriAI/litellm/issues/7611
2025-01-07 21:22:31 -08:00
Krish Dholakia c3edfc2c92 LiteLLM Minor Fixes & Improvements (12/23/2024) - p3 (#7394)
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching

* fix(context_caching/transformation.py): just use last identified cache point

Fixes https://github.com/BerriAI/litellm/issues/6738

* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google

Fixes https://github.com/BerriAI/litellm/issues/6738

* fix(vertex_ai/gemini/): track context caching tokens

* refactor(gemini/): place transformation.py inside `chat/` folder

make it easy for user to know we support the equivalent endpoint

* fix: fix import

* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder

make it easier to see cost calculation logic

* fix: fix linting errors

* fix: fix circular import

* feat(gemini/cost_calculator.py): support gemini context caching cost calculation

generifies anthropic's cost calculation function and uses it across anthropic + gemini

* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching

Closes https://github.com/BerriAI/litellm/issues/6891

* docs(gemini.md): add gemini context caching architecture diagram

make it easier for user to understand how context caching works

* docs(gemini.md): link to relevant gemini context caching code

* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code

* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario

* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation

* test: fix test
2024-12-23 22:02:52 -08:00
Krish Dholakia 179d2f56b7 LiteLLM Minor Fixes & Improvements (12/16/2024) - p1 (#7263)
* fix(factory.py): skip empty text blocks for bedrock user messages

Fixes https://github.com/BerriAI/litellm/issues/7169

* Add support for Gemini 2.0 GoogleSearch tool (#7257)

* Add support for google_search tool in gemini 2.0

* Add/modify tests

* Fix grounding check

* Remove 2.0 grounding test; exclude experimental model in VERTEX_MODELS_TO_NOT_TEST

* Swap order of tools

* DFix formatting

* fix(get_api_base.py): return api base in streaming response

Fixes https://github.com/BerriAI/litellm/issues/7249

Closes https://github.com/BerriAI/litellm/pull/7250

* fix(cost_calculator.py): only set base model to model if not none

Fixes https://github.com/BerriAI/litellm/issues/7223

* fix(cost_calculator.py): enforce stricter order when picking model for cost calculation

* fix(cost_calculator.py): fix '_select_model_name_for_cost_calc' to return model name with region name prefix if provided

* fix(utils.py): fix 'get_model_info()' to handle edge case where model name starts with custom llm provider AND custom llm provider is given

* fix(cost_calculator.py): handle `custom_llm_provider-` scenario

* fix(cost_calculator.py): e2e working tts cost tracking

ensures initial message is passed in, to cost calculator

* fix(factory.py): suppress linting errors

* fix(cost_calculator.py): strip llm provider from model name after selecting cost calc model

* fix(litellm_logging.py): store initial request in 'input' field + accept base_model to be passed in litellm_params directly

* test: handle none env var value in flaky test

* fix(litellm_logging.py): fix linting errors

---------

Co-authored-by: Sam B <samlingx@gmail.com>
2024-12-17 15:33:36 -08:00
Krish Dholakia 224ead1531 fix(utils.py): fix openai-like api response format parsing (#7273)
* fix(utils.py): fix openai-like api response format parsing

Fixes issue passing structured output to litellm_proxy/ route

* fix(cost_calculator.py): fix whisper transcription cost calc to use file duration, not response time

'

* test: skip test if credentials not found
2024-12-17 12:49:09 -08:00
Krish Dholakia ec36353b41 fix(main.py): fix retries being multiplied when using openai sdk (#7221)
* fix(main.py): fix retries being multiplied when using openai sdk

Closes https://github.com/BerriAI/litellm/pull/7130

* docs(prompt_management.md): add langfuse prompt management doc

* feat(team_endpoints.py): allow teams to add their own models

Enables teams to call their own finetuned models via the proxy

* test: add better enforcement check testing for `/model/new` now that teams can add their own models

* docs(team_model_add.md): tutorial for allowing teams to add their own models

* test: fix test
2024-12-14 11:56:55 -08:00