Several tests parametrized over (model, api_key, ...) tuples or raw
token strings, causing pytest to embed those values in the test ID
and print them in CI logs. Refactored each affected test to keep the
same coverage without putting key material into parametrize.
- audio_tests/test_audio_speech.py: split env-var keys into separate
azure/openai test functions sharing a helper; sync_mode parametrize
preserved.
- audio_tests/test_whisper.py: split into openai_whisper /
azure_whisper functions sharing a helper; response_format parametrize
preserved.
- local_testing/test_embedding.py: single-case parametrize inlined.
- proxy_unit_tests/test_user_api_key_auth.py: 5 header parametrize
cases split into 5 named tests sharing an _assert helper.
- proxy_unit_tests/test_proxy_utils.py: 4 api_key_value cases split
into 4 named tests.
- test_litellm/proxy/auth/test_user_api_key_auth.py: 5 key-prefix
cases (Bearer / Basic / lowercase bearer / raw / AWS SigV4) split
into 5 named tests.
Verified: black clean; 14 refactored unit tests pass; pytest collects
audio/embedding tests with safe IDs (no key material in test IDs).
* fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (#25257)
* Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant
Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.
* Batch-limit stale managed object cleanup with single bounded SQL query
Two fixes to _cleanup_stale_managed_objects:
1. Replace unbounded update_many with a single execute_raw using a
subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
rows. Zero rows loaded into Python memory — everything stays in Postgres.
Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
(the proxy requires PostgreSQL per schema.prisma).
2. Extract _expire_stale_rows as a separate method for testability.
Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.
* docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference
* test: remove deprecated embed-english-v2.0 cohere embedding tests
Replace text-embedding-004 with gemini-embedding-001.
The old model was deprecated and returns 404:
'models/text-embedding-004 is not found for API version v1beta'
Co-authored-by: Shin <shin@openclaw.ai>
* fix(litellm_pre_call_utils.py): add user agent tags to spend logs in standard logging payload logic
avoid clash when tag based routing is enabled
* test: remove redundant test
* test: rename oidc test to run earlier
quicker debuging
* fix(azure.py): return more detailed error message
* fix(azure/common_utils.py): use default scope, if scope is none
fixes oidc test
* fix: always default to cognitiveservices.azure.com
* test: update test
* fix(auth_checks.py): enforce auth checks on target model names
ensures user has access to models they are trying to call
* test(test_auth_utils.py): add unit tests for auth check
* fix(exception_mapping_utils.py): handle mistral 429 exception
* fix: fix linting error
* fix(auth_checks.py): add max fallback depth
* Update docs for OpenAI compatible providers, add Llamafile docs, include Llamafile in the sidebar
* Add Llamafile as an LlmProviders enum
* Add llamafile as a OpenAI compatible provider (in the list of compatible providers)
* Add Llamafile chat config and tests
* Wire up Llamafile
Co-authored-by: Peter Wilson <peter@mozilla.ai>
* fix(proxy_server.py): fix get model info when litellm_model_id is set
Fixes https://github.com/BerriAI/litellm/issues/7873
* test(test_models.py): add test to ensure get model info on specific deployment has same value as all model info
Fixes https://github.com/BerriAI/litellm/issues/7873
* fix(usage.tsx): make model analytics free
Fixes @iqballx's feedback
* fix(fix(invoke_handler.py):-fix-bedrock-error-chunk-parsing): return correct bedrock status code and error message if chunk in stream
Improves bedrock stream error handling
* fix(proxy_server.py): fix linting errors
* test(test_auth_checks.py): remove redundant test
* fix(proxy_server.py): fix linting errors
* test: fix flaky test
* test: fix test
* feat(langfuse.py): log the used prompt when prompt management used
* test: fix test
* docs(self_serve.md): add doc on restricting personal key creation on ui
* feat(s3.py): support s3 logging with team alias prefixes (if available)
New preview feature
* fix(main.py): remove old if block - simplify to just await if coroutine returned
fixes lm_studio async embedding error
* fix(langfuse.py): handle get prompt check
* fix(main.py): fix lm_studio/ embedding routing
adds the mapping + updates docs with example
* docs(self_serve.md): update doc to show how to auto-add sso users to teams
* fix(streaming_handler.py): simplify async iterator check, to just check if streaming response is an async iterable
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error
Fixes https://github.com/BerriAI/litellm/issues/7291
* test: fix test
* fix(main.py): add hosted_vllm/ support for embeddings endpoint
Closes https://github.com/BerriAI/litellm/issues/7290
* docs(vllm.md): add docs on vllm embeddings usage
* fix(__init__.py): fix sambanova model test
* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
* feat(pass_through_endpoints/): support logging anthropic/gemini pass through calls to langfuse/s3/etc.
* fix(utils.py): allow disabling end user cost tracking with new param
Allows proxy admin to disable cost tracking for end user - keeps prometheus metrics small
* docs(configs.md): add disable_end_user_cost_tracking reference to docs
* feat(key_management_endpoints.py): add support for restricting access to `/key/generate` by team/proxy level role
Enables admin to restrict key creation, and assign team admins to handle distributing keys
* test(test_key_management.py): add unit testing for personal / team key restriction checks
* docs: add docs on restricting key creation
* docs(finetuned_models.md): add new guide on calling finetuned models
* docs(input.md): cleanup anthropic supported params
Closes https://github.com/BerriAI/litellm/issues/6856
* test(test_embedding.py): add test for passing extra headers via embedding
* feat(cohere/embed): pass client to async embedding
* feat(rerank.py): add `/v1/rerank` if missing for cohere base url
Closes https://github.com/BerriAI/litellm/issues/6844
* fix(main.py): pass extra_headers param to openai
Fixes https://github.com/BerriAI/litellm/issues/6836
* fix(litellm_logging.py): don't disable global callbacks when dynamic callbacks are set
Fixes issue where global callbacks - e.g. prometheus were overriden when langfuse was set dynamically
* fix(handler.py): fix linting error
* fix: fix typing
* build: add conftest to proxy_admin_ui_tests/
* test: fix test
* fix: fix linting errors
* test: fix test
* fix: fix pass through testing
* fix(utils.py): support passing dynamic api base to validate_environment
Returns True if just api base is required and api base is passed
* fix(litellm_pre_call_utils.py): feature flag sending client headers to llm api
Fixes https://github.com/BerriAI/litellm/issues/6410
* fix(anthropic/chat/transformation.py): return correct error message
* fix(http_handler.py): add error response text in places where we expect it
* fix(factory.py): handle base case of no non-system messages to bedrock
Fixes https://github.com/BerriAI/litellm/issues/6411
* feat(cohere/embed): Support cohere image embeddings
Closes https://github.com/BerriAI/litellm/issues/6413
* fix(__init__.py): fix linting error
* docs(supported_embedding.md): add image embedding example to docs
* feat(cohere/embed): use cohere embedding returned usage for cost calc
* build(model_prices_and_context_window.json): add embed-english-v3.0 details (image cost + 'supports_image_input' flag)
* fix(cohere_transformation.py): fix linting error
* test(test_proxy_server.py): cleanup test
* test: cleanup test
* fix: fix linting errors
* Do not skip important tests for OIDC. (#6017)
* [Bug] Skip monthly slack alert if there was no spend (#6015)
* Fix: skip slack alert if there was no spend
* Skip monthly report when there was no spend
---------
Co-authored-by: María Paz Cuturi <paz@MacBook-Pro-de-Paz.local>
---------
Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
Co-authored-by: Paz <paz@tryolabs.com>
Co-authored-by: María Paz Cuturi <paz@MacBook-Pro-de-Paz.local>