yuneng-jiang
92de7423ef
fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 ( #28281 )
...
* fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5
OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so the live audio
calls in test_stream_chunk_builder_openai_audio_output_usage and
test_standard_logging_payload_audio now hard-fail with a model-not-found
error on every PR. The error was not "openai-internal", so the except
block swallowed it and execution fell through to an unbound
completion/response (UnboundLocalError).
Switch both tests to gpt-audio-1.5, OpenAI's recommended successor
(GA, not deprecated, already present in the litellm cost map so the
response_cost assertion still resolves). Also broaden the except to
skip with the real error in the reason instead of crashing, so a
transient upstream blip can't reintroduce the UnboundLocalError.
* fix(tests): narrow audio-test skip to model-not-found, re-raise the rest
Address review feedback: an unconditional skip on any exception would
silently mask a litellm-internal regression in the audio path (broken
param transformation, serialization, bad header) instead of failing CI.
Skip only on the upstream-unavailable class (model_not_found / "does not
exist" / openai-internal) and re-raise everything else, so genuine
regressions still fail loudly. The UnboundLocalError is still fixed
because the handler either skips or raises - it never falls through.
* fix(tests): add budget_exceeded to expected Interaction status enum
Staging added budget_exceeded to the Interaction OpenAPI status enum; the staging merge into this branch picked up the spec change but not the matching test update, so test_status_enum_values failed in CI. Align the test's expected list (exact-match by design) with the live spec.
* fix(tests): mock HTTP fetch in test_img_url_token_counter
The test parameterized a live third-party image URL (blog.purpureus.net) which now 404s, causing get_image_dimensions to fall through to its base64 decode path and crash with 'not enough values to unpack' on every PR run. Mock safe_get with a tiny 1x1 PNG so the URL branch is still exercised without any network dependency.
* fix(tests): swap gpt-4o-audio-preview to gpt-audio-1.5 in test_gpt4o_audio
OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so both live tests in test_gpt4o_audio.py (test_audio_output_from_model and test_audio_input_to_model) hard-fail model_not_found on every PR. Swap the hardcoded model to OpenAI's successor gpt-audio-1.5 (same chat-completions audio surface; already in the litellm cost map). Mirror the narrowed-skip pattern from the prior audio fixes: skip on model_not_found / does-not-exist / openai-internal, re-raise everything else so genuine litellm regressions still fail CI loudly.
2026-05-19 14:48:30 -07:00
Yuneng Jiang
945b10ded4
fix(tests): drop dall-e-only test classes; route live image tests via gpt-image-1
...
Second wave of failures from the 2026-05-12 DALL-E shutdown:
- tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditDallE2
and tests/image_gen_tests/test_image_generation.py::TestOpenAIDalle3
are explicitly named for the deprecated models and can't pass; remove.
gpt-image-1 coverage already exists in sibling classes.
- tests/local_testing/test_router.py image gen tests use dall-e-3 only
as a routing example; swap to gpt-image-1.
- tests/local_testing/test_custom_callback_input.py image_generation
success/failure paths swapped to gpt-image-1.
2026-05-12 16:16:59 -07:00
Yuneng Jiang
c8cfc5de21
fix(httpx): set response.request and strip content-encoding in MaskedHTTPStatusError
...
MaskedHTTPStatusError constructs a new httpx.Response from the original
error. Two bugs surfaced under real HTTP error responses:
1. The new Response was created without request=, so response.request
raised RuntimeError("The .request property has not been set.") for
any downstream caller (e.g. exception_mapping_utils) that inspected it.
2. The decoded response bytes were passed together with the original
Content-Encoding header. On construction httpx tried to decompress
the already-decoded bytes and raised httpx.DecodingError
("Error -3 while decompressing data: incorrect header check").
Set response.request to the masked Request and strip Content-Encoding
(and the now-stale Content-Length) before rebuilding the Response.
URL/message masking is unchanged; the new request carries the already
masked URL.
Also update test_logging_key_masking_gemini: the security commit
25f93bed91 moved Gemini API keys from ?key=... URL params to the
x-goog-api-key header, so api_base no longer contains the key.
2026-04-15 22:03:48 -07:00
yuneng-jiang
06681ddfcc
Fix flaky audio streaming cost assertion in test_standard_logging_payload_audio
...
Audio streaming responses may not always report token counts, leading to
0.0 response_cost. Relax the assertion to >= 0 for streaming, keep > 0
for non-streaming.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-12 23:23:10 -07:00
Chesars
4e6e1d8de8
merge: resolve conflicts with upstream staging (bedrock + mcp tests)
...
Keep both sets of tests: upstream's OAuth2 token injection test and
our case-insensitive tool matching tests. Use upstream's version of
the bedrock output_config test (more comprehensive).
2026-03-12 13:40:16 -03:00
Chesars
feed274aa3
Reapply "feat: add model_cost aliases expansion support"
...
This reverts commit 3d2df7e8b5 .
2026-03-12 13:36:57 -03:00
Sameer Kankute
982f3917c5
Fix test_standard_logging_payload
2026-03-12 18:35:01 +05:30
Chesars
1be6b31e2f
merge: resolve conflicts between main and litellm_oss_staging_03_11_2026
2026-03-12 09:38:31 -03:00
Cesar Garcia
3d2df7e8b5
Revert "feat: add model_cost aliases expansion support"
2026-03-10 22:39:19 -03:00
Sameer Kankute
30fde1de7f
fix(tests): update cache hit redaction assertion to expect choices format
...
Made-with: Cursor
2026-03-10 12:14:24 +05:30
Ishaan Jaff
a50a84c16c
fix(tests): update redaction assertion + remove flaky qwen3 streaming test ( #23062 )
...
test_standard_logging_payload_audio: the response field in standard_logging_object
is now a ModelResponse choices dict (since d84e5e381a ), not {text: redacted-by-litellm}.
Update both audio and non-audio variants to check choices[0].message.content instead.
Audio is still correctly redacted - the new code creates a fresh ModelResponse with no
audio field, so audio bytes never appear in the payload.
test_partner_models_httpx_streaming: remove qwen3-coder-480b (us-south1) from the
parametrize list - same treatment as llama-4-scout which was removed earlier.
The endpoint is unavailable in CI and the test has been consistently failing.
2026-03-07 16:07:14 -08:00
Ishaan Jaffer
94c2c28f3d
claude-sonnet-4-5-20250929 fix
2025-10-31 18:20:52 -07:00
Ishaan Jaffer
5cca4c8b4f
test_image_generation_openai
2025-10-25 14:57:40 -07:00
Ishaan Jaffer
0bedf1c0a7
fix tests
2025-10-25 10:19:24 -07:00
Ishaan Jaffer
73c32c01a4
test_chat_azure_stream
2025-09-27 15:01:25 -07:00
Krish Dholakia
f9331ac43e
Merge branch 'main' into litellm_ci_cd_linting_fixes_09_29_2025_p2
2025-09-27 14:15:27 -07:00
Krrish Dholakia
77670fa419
fix: fix test
2025-09-27 14:14:40 -07:00
Ishaan Jaffer
81282d14b5
test_async_embedding_openai
2025-09-27 13:03:17 -07:00
Ishaan Jaffer
6aa35ec999
test text-embedding-ada-002
2025-09-27 12:41:35 -07:00
Ishaan Jaffer
02cc9133a5
test_async_chat_azure_stream
2025-09-27 12:37:36 -07:00
Ishaan Jaffer
8510c70416
test fixes
2025-09-27 09:11:43 -07:00
Alexsander Hamir
eaa04cd8ce
fix: use fastuuid helper ( #14903 )
...
* fix: use fastuuid helper across the codebase
First batch of changes, simple drop in replacement.
* second batch of changes
* fixed: script mistake on helper file
2025-09-25 15:47:01 -07:00
Krrish Dholakia
0854c35d3e
test: remove eol bedrock model from tests
2025-09-09 19:48:35 -07:00
Krrish Dholakia
6322aef0e3
fix(streaming_handler.py): fix streaming chunk calculation
2025-08-16 14:25:29 -07:00
Krrish Dholakia
eb66daeef7
test: update test
...
we now return correct token usage on clientside
2025-08-16 14:14:35 -07:00
Ishaan Jaff
5a051cb264
test_async_embedding_azure_caching - flaky test
2025-06-14 13:55:29 -07:00
Krrish Dholakia
31a73be03f
fix(litellm_logging.py): skip should_run_logging check on streaming
2025-06-13 21:19:24 -07:00
Krish Dholakia
d783190e04
Update fireworks ai pricing ( #10425 )
...
* build(model_prices_and_context_window.json): add fireworks ai new 0-4b pricing tier
* build(model_prices_and_context_window.json): add more fireworks ai models
* test: update testing
* test: testing updates
* test: update test
* test: update test
2025-04-29 20:58:05 -07:00
Krish Dholakia
1ea046cc61
test: update tests to new deployment model ( #10142 )
...
* test: update tests to new deployment model
* test: update model name
* test: skip cohere rbac issue test
* test: update test - replace gpt-4o model
2025-04-18 14:22:12 -07:00
Ishaan Jaff
c3341a1e18
test fixes - azure deprecated dall-e-2
2025-04-02 20:56:20 -07:00
Krrish Dholakia
8618295911
test: loosen test
2025-03-17 09:44:22 -07:00
Krrish Dholakia
d01361747d
test: make test less flaky
2025-03-17 09:00:15 -07:00
Krish Dholakia
142b195784
Add anthropic thinking + reasoning content support ( #8778 )
...
* feat(anthropic/chat/transformation.py): add anthropic thinking param support
* feat(anthropic/chat/transformation.py): support returning thinking content for anthropic on streaming responses
* feat(anthropic/chat/transformation.py): return list of thinking blocks (include block signature)
allows usage in tool call responses
* fix(types/utils.py): extract and map reasoning_content from anthropic as content str
* test: add testing to ensure thinking_blocks are returned at the root
* fix(anthropic/chat/handler.py): return thinking blocks on streaming - include signature
* feat(factory.py): handle anthropic thinking blocks translation if in assistant response
* test: handle openai internal instability
* test: handle openai audio instability
* ci: pin anthropic dep
* test: handle openai audio instability
* fix: fix linting error
* refactor(anthropic/chat/transformation.py): refactor function to remain <50 LOC
* fix: fix linting error
* fix: fix linting error
* fix: fix linting error
* fix: fix linting error
2025-02-24 21:54:30 -08:00
Ishaan Jaff
818792228c
(Refactor) - migrate bedrock invoke to BaseLLMHTTPHandler class ( #8290 )
...
* initial transform for invoke
* invoke transform_response
* working - able to make request
* working get_complete_url
* working - invoke now runs on llm_http_handler
* fix unused imports
* track litellm overhead ms
* working stream request
* sign_request transform
* sign_request update
* use has_async_custom_stream_wrapper property
* use get_async_custom_stream_wrapper in base llm http handler
* fix make_call in invoke handler
* fix invoke with streaming get_async_custom_stream_wrapper
* working bedrock async streaming with invoke
* fix make call handler for bedrock
* test_all_model_configs
* fix test_bedrock_custom_prompt_template
* sync streaming for bedrock invoke
* fix _add_stream_param_to_request_body
* test_async_text_completion_bedrock
* fix transform_request
* fix get_supported_openai_params
* fix test supports tool choice
* fix test_supports_tool_choice
* add unit test coverage for bedrock invoke transform
* fix location of transformation files
* update import loc
* fix bedrock invoke unit tests
* fix import for max completion tokens
2025-02-05 18:58:55 -08:00
Ishaan Jaff
2cf0daa31c
(Fixes) OpenAI Streaming Token Counting + Fixes usage track when litellm.turn_off_message_logging=True ( #8156 )
...
* working streaming usage tracking
* fix test_async_chat_openai_stream_options
* fix await asyncio.sleep(1)
* test_async_chat_azure
* fix s3 logging
* fix get_stream_options
* fix get_stream_options
* fix streaming handler
* test_stream_token_counting_with_redaction
* fix codeql concern
2025-01-31 15:06:37 -08:00
Krish Dholakia
69a6da4727
Litellm dev 01 30 2025 p2 ( #8134 )
...
* feat(lowest_tpm_rpm_v2.py): fix redis cache check to use >= instead of >
makes it consistent
* test(test_custom_guardrails.py): add more unit testing on default on guardrails
ensure it runs if user sent guardrail list is empty
* docs(quick_start.md): clarify default on guardrails run even if user guardrails list contains other guardrails
* refactor(litellm_logging.py): refactor no-log to helper util
allows for more consistent behavior
* feat(litellm_logging.py): add event hook to verbose logs
* fix(litellm_logging.py): add unit testing to ensure `litellm.disable_no_log_param` is respected
* docs(logging.md): document how to disable 'no-log' param
* test: fix test to handle feb
* test: cleanup old bedrock model
* fix: fix router check
2025-01-30 22:18:53 -08:00
Ishaan Jaff
8a235e7d38
(Refactor / QA) - Use LoggingCallbackManager to append callbacks and ensure no duplicate callbacks are added ( #8112 )
...
* LoggingCallbackManager
* add logging_callback_manager
* use logging_callback_manager
* add add_litellm_failure_callback
* use add_litellm_callback
* use add_litellm_async_success_callback
* add_litellm_async_failure_callback
* linting fix
* fix logging callback manager
* test_duplicate_multiple_loggers_test
* use _reset_all_callbacks
* fix testing with dup callbacks
* test_basic_image_generation
* reset callbacks for tests
* fix check for _add_custom_logger_to_list
* fix test_amazing_sync_embedding
* fix _get_custom_logger_key
* fix batches testing
* fix _reset_all_callbacks
* fix _check_callback_list_size
* add callback_manager_test
* fix test gemini-2.0-flash-thinking-exp-01-21
2025-01-30 19:35:50 -08:00
Krish Dholakia
33f301ec86
Litellm dev 01 02 2025 p1 ( #7516 )
...
* fix(redact_messages.py): fix redact messages for non-model response input to be dictionary
fixes issue with otel logging when message redaction is enabled
* fix(proxy_server.py): fix langfuse key leak in exception string
* test: fix test
* test: fix test
* test: fix tests
2025-01-03 14:40:57 -08:00
Krish Dholakia
3671829e39
Complete 'requests' library removal ( #7350 )
...
* refactor: initial commit moving watsonx_text to base_llm_http_handler + clarifying new provider directory structure
* refactor(watsonx/completion/handler.py): move to using base llm http handler
removes 'requests' library usage
* fix(watsonx_text/transformation.py): fix result transformation
migrates to transformation.py, for usage with base llm http handler
* fix(streaming_handler.py): migrate watsonx streaming to transformation.py
ensures streaming works with base llm http handler
* fix(streaming_handler.py): fix streaming linting errors and remove watsonx conditional logic
* fix(watsonx/): fix chat route post completion route refactor
* refactor(watsonx/embed): refactor watsonx to use base llm http handler for embedding calls as well
* refactor(base.py): remove requests library usage from litellm
* build(pyproject.toml): remove requests library usage
* fix: fix linting errors
* fix: fix linting errors
* fix(types/utils.py): fix validation errors for modelresponsestream
* fix(replicate/handler.py): fix linting errors
* fix(litellm_logging.py): handle modelresponsestream object
* fix(streaming_handler.py): fix modelresponsestream args
* fix: remove unused imports
* test: fix test
* fix: fix test
* test: fix test
* test: fix tests
* test: fix test
* test: fix patch target
* test: fix test
2024-12-22 07:21:25 -08:00
Krish Dholakia
e9aa492af3
LiteLLM Minor Fixes & Improvement (11/14/2024) ( #6730 )
...
* fix(ollama.py): fix get model info request
Fixes https://github.com/BerriAI/litellm/issues/6703
* feat(anthropic/chat/transformation.py): support passing user id to anthropic via openai 'user' param
* docs(anthropic.md): document all supported openai params for anthropic
* test: fix tests
* fix: fix tests
* feat(jina_ai/): add rerank support
Closes https://github.com/BerriAI/litellm/issues/6691
* test: handle service unavailable error
* fix(handler.py): refactor together ai rerank call
* test: update test to handle overloaded error
* test: fix test
* Litellm router trace (#6742 )
* feat(router.py): add trace_id to parent functions - allows tracking retry/fallbacks
* feat(router.py): log trace id across retry/fallback logic
allows grouping llm logs for the same request
* test: fix tests
* fix: fix test
* fix(transformation.py): only set non-none stop_sequences
* Litellm router disable fallbacks (#6743 )
* bump: version 1.52.6 → 1.52.7
* feat(router.py): enable dynamically disabling fallbacks
Allows for enabling/disabling fallbacks per key
* feat(litellm_pre_call_utils.py): support setting 'disable_fallbacks' on litellm key
* test: fix test
* fix(exception_mapping_utils.py): map 'model is overloaded' to internal server error
* test: handle gemini error
* test: fix test
* fix: new run
2024-11-15 01:02:54 +05:30
Krish Dholakia
136693cac4
LiteLLM Minor Fixes & Improvements (11/05/2024) ( #6590 )
...
* fix(pattern_matching_router.py): update model name using correct function
* fix(langfuse.py): metadata deepcopy can cause unhandled error (#6563 )
Co-authored-by: seva <seva@inita.com >
* fix(stream_chunk_builder_utils.py): correctly set prompt tokens + log correct streaming usage
Closes https://github.com/BerriAI/litellm/issues/6488
* build(deps): bump cookie and express in /docs/my-website (#6566 )
Bumps [cookie](https://github.com/jshttp/cookie ) and [express](https://github.com/expressjs/express ). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases )
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1 )
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases )
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md )
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1 )
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554 )
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
* (proxy fix) - call connect on prisma client when running setup (#6534 )
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588 )
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573 )
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* fix ImageObject conversion (#6584 )
* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546 )
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* fix allow using 15 seconds for premium license check
* testing fix bedrock deprecated cohere.command-text-v14
* (feat) add `Predicted Outputs` for OpenAI (#6594 )
* bump openai to openai==1.54.0
* add 'prediction' param
* testing fix bedrock deprecated cohere.command-text-v14
* test test_openai_prediction_param.py
* test_openai_prediction_param_with_caching
* doc Predicted Outputs
* doc Predicted Output
* (fix) Vertex Improve Performance when using `image_url` (#6593 )
* fix transformation vertex
* test test_process_gemini_image
* test_image_completion_request
* testing fix - bedrock has deprecated cohere.command-text-v14
* fix vertex pdf
* bump: version 1.51.5 → 1.52.0
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577 )
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check
* fix(lowest_tpm_rpm_v2.py): return headers in correct format
* test: update test
* build(deps): bump cookie and express in /docs/my-website (#6566 )
Bumps [cookie](https://github.com/jshttp/cookie ) and [express](https://github.com/expressjs/express ). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases )
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1 )
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases )
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md )
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1 )
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554 )
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
* (proxy fix) - call connect on prisma client when running setup (#6534 )
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588 )
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573 )
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* test: remove eol model
* fix(proxy_server.py): fix db config loading logic
* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten
* test: skip test if required env var is missing
* test: fix test
---------
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com >
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com >
* test: mark flaky test
* test: handle anthropic api instability
* test(test_proxy_utils.py): add testing for db config update logic
* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version (#6597 )
* build(deps): bump cookie and express in /docs/my-website (#6566 )
Bumps [cookie](https://github.com/jshttp/cookie ) and [express](https://github.com/expressjs/express ). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases )
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1 )
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases )
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md )
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1 )
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554 )
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
* (proxy fix) - call connect on prisma client when running setup (#6534 )
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588 )
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573 )
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* fix ImageObject conversion (#6584 )
* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546 )
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* fix allow using 15 seconds for premium license check
* testing fix bedrock deprecated cohere.command-text-v14
* (feat) add `Predicted Outputs` for OpenAI (#6594 )
* bump openai to openai==1.54.0
* add 'prediction' param
* testing fix bedrock deprecated cohere.command-text-v14
* test test_openai_prediction_param.py
* test_openai_prediction_param_with_caching
* doc Predicted Outputs
* doc Predicted Output
* (fix) Vertex Improve Performance when using `image_url` (#6593 )
* fix transformation vertex
* test test_process_gemini_image
* test_image_completion_request
* testing fix - bedrock has deprecated cohere.command-text-v14
* fix vertex pdf
* bump: version 1.51.5 → 1.52.0
* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version
---------
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com >
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com >
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com >
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com >
* fix(langfuse.py): fix linting errors
* fix: fix linting errors
* fix: fix casting error
* fix: fix typing error
* fix: add more tests
* fix(utils.py): fix return_processed_chunk_logic
* Revert "Update setuptools in docker and fastapi to latest verison, in order t…" (#6615 )
This reverts commit 1a7f7bdfb75df0efbc930b7f2e39febc80e97d5a.
* docs fix clarify team_id on team based logging
* doc fix team based logging with langfuse
* fix flake8 checks
* test: bump sleep time
* refactor: replace claude-instant-1.2 with haiku in testing
* fix(proxy_server.py): move to using sl payload in track_cost_callback
* fix(proxy_server.py): fix linting errors
* fix(proxy_server.py): fallback to kwargs(response_cost) if given
* test: remove claude-instant-1 from tests
* test: fix claude test
* docs fix clarify team_id on team based logging
* doc fix team based logging with langfuse
* build: remove lint.yml
---------
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
Co-authored-by: Vsevolod Karvetskiy <56288164+karvetskiy@users.noreply.github.com >
Co-authored-by: seva <seva@inita.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com >
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com >
Co-authored-by: Jacob Hagstedt P Suorra <Jacobh2@users.noreply.github.com >
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com >
2024-11-07 04:17:05 +05:30
Krish Dholakia
c58d542282
Litellm openai audio streaming ( #6325 )
...
* refactor(main.py): streaming_chunk_builder
use <100 lines of code
refactor each component into a separate function - easier to maintain + test
* fix(utils.py): handle choices being None
openai pydantic schema updated
* fix(main.py): fix linting error
* feat(streaming_chunk_builder_utils.py): update stream chunk builder to support rebuilding audio chunks from openai
* test(test_custom_callback_input.py): test message redaction works for audio output
* fix(streaming_chunk_builder_utils.py): return anthropic token usage info directly
* fix(stream_chunk_builder_utils.py): run validation check before entering chunk processor
* fix(main.py): fix import
2024-10-19 16:16:51 -07:00
Ishaan Jaff
a69c670baa
(refactor) use helper function _assemble_complete_response_from_streaming_chunks to assemble complete responses in caching and logging callbacks ( #6220 )
...
* (refactor) use _assemble_complete_response_from_streaming_chunks
* add unit test for test_assemble_complete_response_from_streaming_chunks_1
* fix assemble complete_streaming_response
* config add logging_testing
* add logging_coverage in codecov
* test test_assemble_complete_response_from_streaming_chunks_3
* add unit tests for _assemble_complete_response_from_streaming_chunks
* fix remove unused / junk function
* add test for streaming_chunks when error assembling
2024-10-15 12:45:12 +05:30
Krish Dholakia
2acb0c0675
Litellm Minor Fixes & Improvements (10/12/2024) ( #6179 )
...
* build(model_prices_and_context_window.json): add bedrock llama3.2 pricing
* build(model_prices_and_context_window.json): add bedrock cross region inference pricing
* Revert "(perf) move s3 logging to Batch logging + async [94% faster perf under 100 RPS on 1 litellm instance] (#6165 )"
This reverts commit 2a5624af47 .
* add azure/gpt-4o-2024-05-13 (#6174 )
* LiteLLM Minor Fixes & Improvements (10/10/2024) (#6158 )
* refactor(vertex_ai_partner_models/anthropic): refactor anthropic to use partner model logic
* fix(vertex_ai/): support passing custom api base to partner models
Fixes https://github.com/BerriAI/litellm/issues/4317
* fix(proxy_server.py): Fix prometheus premium user check logic
* docs(prometheus.md): update quick start docs
* fix(custom_llm.py): support passing dynamic api key + api base
* fix(realtime_api/main.py): Add request/response logging for realtime api endpoints
Closes https://github.com/BerriAI/litellm/issues/6081
* feat(openai/realtime): add openai realtime api logging
Closes https://github.com/BerriAI/litellm/issues/6081
* fix(realtime_streaming.py): fix linting errors
* fix(realtime_streaming.py): fix linting errors
* fix: fix linting errors
* fix pattern match router
* Add literalai in the sidebar observability category (#6163 )
* fix: add literalai in the sidebar
* fix: typo
* update (#6160 )
* Feat: Add Langtrace integration (#5341 )
* Feat: Add Langtrace integration
* add langtrace service name
* fix timestamps for traces
* add tests
* Discard Callback + use existing otel logger
* cleanup
* remove print statments
* remove callback
* add docs
* docs
* add logging docs
* format logging
* remove emoji and add litellm proxy example
* format logging
* format `logging.md`
* add langtrace docs to logging.md
* sync conflict
* docs fix
* (perf) move s3 logging to Batch logging + async [94% faster perf under 100 RPS on 1 litellm instance] (#6165 )
* fix move s3 to use customLogger
* add basic s3 logging test
* add s3 to custom logger compatible
* use batch logger for s3
* s3 set flush interval and batch size
* fix s3 logging
* add notes on s3 logging
* fix s3 logging
* add basic s3 logging test
* fix s3 type errors
* add test for sync logging on s3
* fix: fix to debug log
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com >
Co-authored-by: Willy Douhard <willy.douhard@gmail.com >
Co-authored-by: yujonglee <yujonglee.dev@gmail.com >
Co-authored-by: Ali Waleed <ali@scale3labs.com >
* docs(custom_llm_server.md): update doc on passing custom params
* fix(pass_through_endpoints.py): don't require headers
Fixes https://github.com/BerriAI/litellm/issues/6128
* feat(utils.py): add support for caching rerank endpoints
Closes https://github.com/BerriAI/litellm/issues/6144
* feat(litellm_logging.py'): add response headers for failed requests
Closes https://github.com/BerriAI/litellm/issues/6159
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com >
Co-authored-by: Willy Douhard <willy.douhard@gmail.com >
Co-authored-by: yujonglee <yujonglee.dev@gmail.com >
Co-authored-by: Ali Waleed <ali@scale3labs.com >
2024-10-12 11:48:34 -07:00
Krish Dholakia
6005450c8f
LiteLLM Minor Fixes & Improvements (10/09/2024) ( #6139 )
...
* fix(utils.py): don't return 'none' response headers
Fixes https://github.com/BerriAI/litellm/issues/6123
* fix(vertex_and_google_ai_studio_gemini.py): support parsing out additional properties and strict value for tool calls
Fixes https://github.com/BerriAI/litellm/issues/6136
* fix(cost_calculator.py): set default character value to none
Fixes https://github.com/BerriAI/litellm/issues/6133#issuecomment-2403290196
* fix(google.py): fix cost per token / cost per char conversion
Fixes https://github.com/BerriAI/litellm/issues/6133#issuecomment-2403370287
* build(model_prices_and_context_window.json): update gemini pricing
Fixes https://github.com/BerriAI/litellm/issues/6133
* build(model_prices_and_context_window.json): update gemini pricing
* fix(litellm_logging.py): fix streaming caching logging when 'turn_off_message_logging' enabled
Stores unredacted response in cache
* build(model_prices_and_context_window.json): update gemini-1.5-flash pricing
* fix(cost_calculator.py): fix default prompt_character count logic
Fixes error in gemini cost calculation
* fix(cost_calculator.py): fix cost calc for tts models
2024-10-10 00:42:11 -07:00
Krish Dholakia
9695c1af10
LiteLLM Minor Fixes & Improvements (10/08/2024) ( #6119 )
...
* refactor(cost_calculator.py): move error line to debug - https://github.com/BerriAI/litellm/issues/5683#issuecomment-2398599498
* fix(migrate-hidden-params-to-read-from-standard-logging-payload): Fixes https://github.com/BerriAI/litellm/issues/5546#issuecomment-2399994026
* fix(types/utils.py): mark weight as a litellm param
Fixes https://github.com/BerriAI/litellm/issues/5781
* feat(internal_user_endpoints.py): fix /user/info + show user max budget as default max budget
Fixes https://github.com/BerriAI/litellm/issues/6117
* feat: support returning team member budget in `/user/info`
Sets user max budget in team as max budget on ui
Closes https://github.com/BerriAI/litellm/issues/6117
* bug fix for optional parameter passing to replicate (#6067 )
Signed-off-by: Mandana Vaziri <mvaziri@us.ibm.com >
* fix(o1_transformation.py): handle o1 temperature=0
o1 doesn't support temp=0, allow admin to drop this param
* test: fix test
---------
Signed-off-by: Mandana Vaziri <mvaziri@us.ibm.com >
Co-authored-by: Mandana Vaziri <mvaziri@us.ibm.com >
2024-10-08 21:57:03 -07:00
Krish Dholakia
2e5c46ef6d
LiteLLM Minor Fixes & Improvements (10/04/2024) ( #6064 )
...
* fix(litellm_logging.py): ensure cache hits are scrubbed if 'turn_off_message_logging' is enabled
* fix(sagemaker.py): fix streaming to raise error immediately
Fixes https://github.com/BerriAI/litellm/issues/6054
* (fixes) gcs bucket key based logging (#6044 )
* fixes for gcs bucket logging
* fix StandardCallbackDynamicParams
* fix - gcs logging when payload is not serializable
* add test_add_callback_via_key_litellm_pre_call_utils_gcs_bucket
* working success callbacks
* linting fixes
* fix linting error
* add type hints to functions
* fixes for dynamic success and failure logging
* fix for test_async_chat_openai_stream
* fix handle case when key based logging vars are set as os.environ/ vars
* fix prometheus track cooldown events on custom logger (#6060 )
* (docs) add 1k rps load test doc (#6059 )
* docs 1k rps load test
* docs load testing
* docs load testing litellm
* docs load testing
* clean up load test doc
* docs prom metrics for load testing
* docs using prometheus on load testing
* doc load testing with prometheus
* (fixes) docs + qa - gcs key based logging (#6061 )
* fixes for required values for gcs bucket
* docs gcs bucket logging
* bump: version 1.48.12 → 1.48.13
* ci/cd run again
* bump: version 1.48.13 → 1.48.14
* update load test doc
* (docs) router settings - on litellm config (#6037 )
* add yaml with all router settings
* add docs for router settings
* docs router settings litellm settings
* (feat) OpenAI prompt caching models to model cost map (#6063 )
* add prompt caching for latest models
* add cache_read_input_token_cost for prompt caching models
* fix(litellm_logging.py): check if param is iterable
Fixes https://github.com/BerriAI/litellm/issues/6025#issuecomment-2393929946
* fix(factory.py): support passing an 'assistant_continue_message' to prevent bedrock error
Fixes https://github.com/BerriAI/litellm/issues/6053
* fix(databricks/chat): handle streaming responses
* fix(factory.py): fix linting error
* fix(utils.py): unify anthropic + deepseek prompt caching information to openai format
Fixes https://github.com/BerriAI/litellm/issues/6069
* test: fix test
* fix(types/utils.py): support all openai roles
Fixes https://github.com/BerriAI/litellm/issues/6052
* test: fix test
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com >
2024-10-04 21:28:53 -04:00
Krish Dholakia
d57be47b0f
Litellm ruff linting enforcement ( #5992 )
...
* ci(config.yml): add a 'check_code_quality' step
Addresses https://github.com/BerriAI/litellm/issues/5991
* ci(config.yml): check why circle ci doesn't pick up this test
* ci(config.yml): fix to run 'check_code_quality' tests
* fix(__init__.py): fix unprotected import
* fix(__init__.py): don't remove unused imports
* build(ruff.toml): update ruff.toml to ignore unused imports
* fix: fix: ruff + pyright - fix linting + type-checking errors
* fix: fix linting errors
* fix(lago.py): fix module init error
* fix: fix linting errors
* ci(config.yml): cd into correct dir for checks
* fix(proxy_server.py): fix linting error
* fix(utils.py): fix bare except
causes ruff linting errors
* fix: ruff - fix remaining linting errors
* fix(clickhouse.py): use standard logging object
* fix(__init__.py): fix unprotected import
* fix: ruff - fix linting errors
* fix: fix linting errors
* ci(config.yml): cleanup code qa step (formatting handled in local_testing)
* fix(_health_endpoints.py): fix ruff linting errors
* ci(config.yml): just use ruff in check_code_quality pipeline for now
* build(custom_guardrail.py): include missing file
* style(embedding_handler.py): fix ruff check
2024-10-01 19:44:20 -04:00
Krrish Dholakia
3560f0ef2c
refactor: move all testing to top-level of repo
...
Closes https://github.com/BerriAI/litellm/issues/486
2024-09-28 21:08:14 -07:00