Commit Graph

40 Commits

Author SHA1 Message Date
yuneng-jiang 9a338e1b6b [Test] Tests: Stop parametrizing API keys into pytest test IDs (#27249)
Several tests parametrized over (model, api_key, ...) tuples or raw
token strings, causing pytest to embed those values in the test ID
and print them in CI logs. Refactored each affected test to keep the
same coverage without putting key material into parametrize.

- audio_tests/test_audio_speech.py: split env-var keys into separate
  azure/openai test functions sharing a helper; sync_mode parametrize
  preserved.
- audio_tests/test_whisper.py: split into openai_whisper /
  azure_whisper functions sharing a helper; response_format parametrize
  preserved.
- local_testing/test_embedding.py: single-case parametrize inlined.
- proxy_unit_tests/test_user_api_key_auth.py: 5 header parametrize
  cases split into 5 named tests sharing an _assert helper.
- proxy_unit_tests/test_proxy_utils.py: 4 api_key_value cases split
  into 4 named tests.
- test_litellm/proxy/auth/test_user_api_key_auth.py: 5 key-prefix
  cases (Bearer / Basic / lowercase bearer / raw / AWS SigV4) split
  into 5 named tests.

Verified: black clean; 14 refactored unit tests pass; pytest collects
audio/embedding tests with safe IDs (no key material in test IDs).
2026-05-05 17:21:18 -07:00
Sameer Kankute a1f0823393 test(embedding): align local_testing OpenAI encoding_format default
Made-with: Cursor
2026-05-01 16:27:13 +05:30
ishaan-berri 7a9a9f0c79 fix: batch-limit stale managed object cleanup to prevent 300K row UPD… (#25258)
* fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (#25257)

* Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant

Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.

* Batch-limit stale managed object cleanup with single bounded SQL query

Two fixes to _cleanup_stale_managed_objects:

1. Replace unbounded update_many with a single execute_raw using a
   subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
   rows. Zero rows loaded into Python memory — everything stays in Postgres.
   Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
   (the proxy requires PostgreSQL per schema.prisma).

2. Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.

* docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference

* test: remove deprecated embed-english-v2.0 cohere embedding tests
2026-04-06 19:11:55 -07:00
Ishaan Jaffer 8c6a67dae1 test_bedrock_embedding_cohere 2026-03-30 21:08:51 -07:00
Krrish Dholakia 9e070143fb test: update key names 2026-03-28 21:13:16 -07:00
Krrish Dholakia bc829d51f2 test: test 2026-03-28 19:17:38 -07:00
Krrish Dholakia bee1607248 test: update test apis 2026-03-28 18:57:27 -07:00
Krrish Dholakia 564ad3195b test: update testing 2026-03-28 18:42:55 -07:00
shin-bot-litellm 537f7af583 fix(test): update deprecated gemini embedding model (#20621)
Replace text-embedding-004 with gemini-embedding-001.

The old model was deprecated and returns 404:
'models/text-embedding-004 is not found for API version v1beta'

Co-authored-by: Shin <shin@openclaw.ai>
2026-02-06 18:35:40 -08:00
Yuta Saito a57f1e2e08 test: remove flaky azure oidc embedding test 2026-01-13 10:34:01 +09:00
Sameer Kankute dbcae4aca5 fix: Add none to encoding_format instead of omitting it 2025-12-16 13:23:15 +05:30
Ishaan Jaffer badbadba0d fix img URL for tests 2025-11-22 09:41:15 -08:00
Ishaan Jaffer 732618f55f test_together_ai_embedding 2025-10-11 09:33:19 -07:00
Ishaan Jaffer b97e56252d test_openai_azure_embedding_optional_arg 2025-09-27 14:03:23 -07:00
Ishaan Jaffer 6aa35ec999 test text-embedding-ada-002 2025-09-27 12:41:35 -07:00
zjx20 92c525ddfe feat(JinaAI): support multimodal embedding models (#13181)
* feat(JinaAI): support multimodal embedding models

* add test case

* add test

* fix test
2025-08-05 19:21:56 -07:00
Ishaan Jaff d9943f9812 fix cohere InternalServerError error mapping 2025-07-16 16:13:34 -07:00
Krish Dholakia 7f8b2579a2 Minor Fixes (#11868)
* fix(litellm_pre_call_utils.py): add user agent tags to spend logs in standard logging payload logic

avoid clash when tag based routing is enabled

* test: remove redundant test

* test: rename oidc test to run earlier

quicker debuging

* fix(azure.py): return more detailed error message

* fix(azure/common_utils.py): use default scope, if scope is none

fixes oidc test

* fix: always default to cognitiveservices.azure.com

* test: update test
2025-06-18 14:12:59 -07:00
Ishaan Jaff 5b451bf483 test_openai_azure_embedding_simple 2025-06-13 19:00:25 -07:00
Krrish Dholakia ec52600f98 test: handle fireworks ai instability 2025-06-11 10:09:28 -07:00
Akim Tsvigun acaa80294c Integration with Nebius AI Studio added (#11143)
* integration with Nebius AI Studio added

* Merged with main

* Reviewer's comments resolved

* spelling error fixed

* accidental change reverted
2025-05-27 11:05:22 -07:00
Krish Dholakia 7210b713dc Add target model name validation (#10722)
* fix(auth_checks.py): enforce auth checks on target model names

ensures user has access to models they are trying to call

* test(test_auth_utils.py): add unit tests for auth check

* fix(exception_mapping_utils.py): handle mistral 429 exception

* fix: fix linting error

* fix(auth_checks.py): add max fallback depth
2025-05-10 14:27:06 -07:00
Ishaan Jaff de7870cb54 Add llamafile as a provider (#10203) (#10482)
* Update docs for OpenAI compatible providers, add Llamafile docs, include Llamafile in the sidebar

* Add Llamafile as an LlmProviders enum

* Add llamafile as a OpenAI compatible provider (in the list of compatible providers)

* Add Llamafile chat config and tests

* Wire up Llamafile

Co-authored-by: Peter Wilson <peter@mozilla.ai>
2025-05-01 18:36:55 -07:00
Ishaan Jaff 311c70698f test_embedding_response_ratelimit_headers 2025-04-11 17:54:54 -07:00
Krish Dholakia 34bdf36eab Add inference providers support for Hugging Face (#8258) (#9738) (#9773)
* Add inference providers support for Hugging Face (#8258)

* add first version of inference providers for huggingface

* temporarily skipping tests

* Add documentation

* Fix titles

* remove max_retries from params and clean up

* add suggestions

* use llm http handler

* update doc

* add suggestions

* run formatters

* add tests

* revert

* revert

* rename file

* set maxsize for lru cache

* fix embeddings

* fix inference url

* fix tests following breaking change in main

* use ChatCompletionRequest

* fix tests and lint

* [Hugging Face] Remove outdated chat completion tests and fix embedding tests (#9749)

* remove or fix tests

* fix link in doc

* fix(config_settings.md): document hf api key

---------

Co-authored-by: célina <hanouticelina@gmail.com>
2025-04-05 10:50:15 -07:00
Krish Dholakia 09462ba80c Add cohere v2/rerank support (#8421) (#8605)
* Add cohere v2/rerank support (#8421)

* Support v2 endpoint cohere rerank

* Add tests and docs

* Make v1 default if old params used

* Update docs

* Update docs pt 2

* Update tests

* Add e2e test

* Clean up code

* Use inheritence for new config

* Fix linting issues (#8608)

* Fix cohere v2 failing test + linting (#8672)

* Fix test and unused imports

* Fix tests

* fix: fix linting errors

* test: handle tgai instability

* fix: skip service unavailable err

* test: print logs for unstable test

* test: skip unreliable tests

---------

Co-authored-by: vibhavbhat <vibhavb00@gmail.com>
2025-02-22 22:25:29 -08:00
Krish Dholakia 251467a525 add bedrock llama vision support + cohere / infinity rerank - 'return_documents' support (#8684)
* build(model_prices_and_context_window.json): mark bedrock llama as supporting vision based on docs

* Add price for Cerebras llama3.3-70b (#8676)

* docs(readme.md): fix contributing docs

point people to new mock directory testing structure s/o @vibhavbhat

* build: update contributing readme

* docs(readme.md): improve docs

* docs(readme.md): cleanup readme on tests/

* docs(README.md): cleanup doc

* feat(infinity/): support returning documents when return_documents=True

* test(test_rerank.py): add e2e testing for cohere rerank

* fix: fix linting errors

* fix(together_ai/): fix together ai transformation

* fix: fix linting error

* fix: fix linting errors

* fix: fix linting errors

* test: mark cohere as flaky

* build: fix model supports check

* test: fix test

* test: mark flaky test

* fix: fix test

* test: fix test

---------

Co-authored-by: Yury Koleda <fut.wrk@gmail.com>
2025-02-20 21:23:54 -08:00
Krish Dholakia c8aa876785 fix(proxy_server.py): fix get model info when litellm_model_id is set + move model analytics to free (#7886)
* fix(proxy_server.py): fix get model info when litellm_model_id is set

Fixes https://github.com/BerriAI/litellm/issues/7873

* test(test_models.py): add test to ensure get model info on specific deployment has same value as all model info

Fixes https://github.com/BerriAI/litellm/issues/7873

* fix(usage.tsx): make model analytics free

Fixes @iqballx's feedback

* fix(fix(invoke_handler.py):-fix-bedrock-error-chunk-parsing): return correct bedrock status code and error message if chunk in stream

Improves bedrock stream error handling

* fix(proxy_server.py): fix linting errors

* test(test_auth_checks.py): remove redundant test

* fix(proxy_server.py): fix linting errors

* test: fix flaky test

* test: fix test
2025-01-21 08:19:07 -08:00
Krish Dholakia 27892acdfc Litellm dev 01 10 2025 p3 (#7682)
* feat(langfuse.py): log the used prompt when prompt management used

* test: fix test

* docs(self_serve.md): add doc on restricting personal key creation on ui

* feat(s3.py): support s3 logging with team alias prefixes (if available)

New preview feature

* fix(main.py): remove old if block - simplify to just await if coroutine returned

fixes lm_studio async embedding error

* fix(langfuse.py): handle get prompt check
2025-01-10 21:56:42 -08:00
Krish Dholakia 865e6d5bda fix(main.py): fix lm_studio/ embedding routing (#7658)
* fix(main.py): fix lm_studio/ embedding routing

adds the mapping + updates docs with example

* docs(self_serve.md): update doc to show how to auto-add sso users to teams

* fix(streaming_handler.py): simplify async iterator check, to just check if streaming response is an async iterable
2025-01-09 23:03:24 -08:00
Krish Dholakia 3671829e39 Complete 'requests' library removal (#7350)
* refactor: initial commit moving watsonx_text to base_llm_http_handler + clarifying new provider directory structure

* refactor(watsonx/completion/handler.py): move to using base llm http handler

removes 'requests' library usage

* fix(watsonx_text/transformation.py): fix result transformation

migrates to transformation.py, for usage with base llm http handler

* fix(streaming_handler.py): migrate watsonx streaming to transformation.py

ensures streaming works with base llm http handler

* fix(streaming_handler.py): fix streaming linting errors and remove watsonx conditional logic

* fix(watsonx/): fix chat route post completion route refactor

* refactor(watsonx/embed): refactor watsonx to use base llm http handler for embedding calls as well

* refactor(base.py): remove requests library usage from litellm

* build(pyproject.toml): remove requests library usage

* fix: fix linting errors

* fix: fix linting errors

* fix(types/utils.py): fix validation errors for modelresponsestream

* fix(replicate/handler.py): fix linting errors

* fix(litellm_logging.py): handle modelresponsestream object

* fix(streaming_handler.py): fix modelresponsestream args

* fix: remove unused imports

* test: fix test

* fix: fix test

* test: fix test

* test: fix tests

* test: fix test

* test: fix patch target

* test: fix test
2024-12-22 07:21:25 -08:00
Ishaan Jaff 6107f9f3f3 [Bug fix ]: Triton /infer handler incompatible with batch responses (#7337)
* migrate triton to base llm http handler

* clean up triton handler.py

* use transform functions for triton

* add TritonConfig

* get openai params for triton

* use triton embedding config

* test_completion_triton_generate_api

* test_completion_triton_infer_api

* fix TritonConfig doc string

* use TritonResponseIterator

* fix triton embeddings

* docs triton chat usage
2024-12-20 20:59:40 -08:00
Krish Dholakia 6a45ee1ef7 fix(hosted_vllm/transformation.py): return fake api key, if none give… (#7301)
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error

Fixes https://github.com/BerriAI/litellm/issues/7291

* test: fix test

* fix(main.py): add hosted_vllm/ support for embeddings endpoint

Closes https://github.com/BerriAI/litellm/issues/7290

* docs(vllm.md): add docs on vllm embeddings usage

* fix(__init__.py): fix sambanova model test

* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
2024-12-18 18:41:53 -08:00
Krrish Dholakia 0caf804f4c feat(databricks/chat): support structured outputs on databricks
Closes https://github.com/BerriAI/litellm/pull/6978

- handles content as list for dbrx, - handles streaming+response_format for dbrx
2024-12-02 23:08:19 -08:00
Krish Dholakia 7e9d8b58f6 LiteLLM Minor Fixes & Improvements (11/23/2024) (#6870)
* feat(pass_through_endpoints/): support logging anthropic/gemini pass through calls to langfuse/s3/etc.

* fix(utils.py): allow disabling end user cost tracking with new param

Allows proxy admin to disable cost tracking for end user - keeps prometheus metrics small

* docs(configs.md): add disable_end_user_cost_tracking reference to docs

* feat(key_management_endpoints.py): add support for restricting access to `/key/generate` by team/proxy level role

Enables admin to restrict key creation, and assign team admins to handle distributing keys

* test(test_key_management.py): add unit testing for personal / team key restriction checks

* docs: add docs on restricting key creation

* docs(finetuned_models.md): add new guide on calling finetuned models

* docs(input.md): cleanup anthropic supported params

Closes https://github.com/BerriAI/litellm/issues/6856

* test(test_embedding.py): add test for passing extra headers via embedding

* feat(cohere/embed): pass client to async embedding

* feat(rerank.py): add `/v1/rerank` if missing for cohere base url

Closes https://github.com/BerriAI/litellm/issues/6844

* fix(main.py): pass extra_headers param to openai

Fixes https://github.com/BerriAI/litellm/issues/6836

* fix(litellm_logging.py): don't disable global callbacks when dynamic callbacks are set

Fixes issue where global callbacks - e.g. prometheus were overriden when langfuse was set dynamically

* fix(handler.py): fix linting error

* fix: fix typing

* build: add conftest to proxy_admin_ui_tests/

* test: fix test

* fix: fix linting errors

* test: fix test

* fix: fix pass through testing
2024-11-23 15:17:40 +05:30
Krish Dholakia 22b8f93f53 LiteLLM Minor Fixes & Improvements (11/01/2024) (#6551)
* fix: add lm_studio support

* fix(cohere_transformation.py): fix transformation logic for azure cohere embedding model name

Fixes https://github.com/BerriAI/litellm/issues/6540

* fix(utils.py): require base64 str to begin with `data:`

Fixes https://github.com/BerriAI/litellm/issues/6541

* fix: cleanup tests

* docs(guardrails.md): fix typo

* fix(opentelemetry.py): move to `.exception` and update 'response_obj' value to handle 'None' case

Fixes https://github.com/BerriAI/litellm/issues/6510

* fix: fix linting noqa placement
2024-11-02 02:09:31 +05:30
Krish Dholakia c03e5da41f LiteLLM Minor Fixes & Improvements (10/24/2024) (#6421)
* fix(utils.py): support passing dynamic api base to validate_environment

Returns True if just api base is required and api base is passed

* fix(litellm_pre_call_utils.py): feature flag sending client headers to llm api

Fixes https://github.com/BerriAI/litellm/issues/6410

* fix(anthropic/chat/transformation.py): return correct error message

* fix(http_handler.py): add error response text in places where we expect it

* fix(factory.py): handle base case of no non-system messages to bedrock

Fixes https://github.com/BerriAI/litellm/issues/6411

* feat(cohere/embed): Support cohere image embeddings

Closes https://github.com/BerriAI/litellm/issues/6413

* fix(__init__.py): fix linting error

* docs(supported_embedding.md): add image embedding example to docs

* feat(cohere/embed): use cohere embedding returned usage for cost calc

* build(model_prices_and_context_window.json): add embed-english-v3.0 details (image cost + 'supports_image_input' flag)

* fix(cohere_transformation.py): fix linting error

* test(test_proxy_server.py): cleanup test

* test: cleanup test

* fix: fix linting errors
2024-10-25 15:55:56 -07:00
Ishaan Jaff 1ab886f80d (contributor PRs) oct 3rd, 2024 (#6034)
* Do not skip important tests for OIDC. (#6017)

* [Bug] Skip monthly slack alert if there was no spend (#6015)

* Fix: skip slack alert if there was no spend

* Skip monthly report when there was no spend

---------

Co-authored-by: María Paz Cuturi <paz@MacBook-Pro-de-Paz.local>

---------

Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
Co-authored-by: Paz <paz@tryolabs.com>
Co-authored-by: María Paz Cuturi <paz@MacBook-Pro-de-Paz.local>
2024-10-03 17:12:34 +05:30
Krrish Dholakia d64e971d8c fix(azure): return response headers for sync embedding calls 2024-09-28 21:08:15 -07:00
Krrish Dholakia 3560f0ef2c refactor: move all testing to top-level of repo
Closes https://github.com/BerriAI/litellm/issues/486
2024-09-28 21:08:14 -07:00