* fix(main.py): fix async retryer
Fixes https://github.com/BerriAI/litellm/issues/12830
* fix(forward_clientside_headers_by_model_group.py): filter out 'content-type' from forwardable headers
clientside content-type != proxy content type, can cause requests to hang
* test(tests/): update tests
* Add new model provider Novita AI (#7582)
* feat: add new model provider Novita AI
* feat: use deepseek r1 model for examples in Novita AI docs
* fix: fix tests
* fix: fix tests for novita
* fix: fix novita transformation
* ci: fix ci yaml
* fix: fix novita transformation and test (#10056)
---------
Co-authored-by: Jason <ggbbddjm@gmail.com>
* Update docs for OpenAI compatible providers, add Llamafile docs, include Llamafile in the sidebar
* Add Llamafile as an LlmProviders enum
* Add llamafile as a OpenAI compatible provider (in the list of compatible providers)
* Add Llamafile chat config and tests
* Wire up Llamafile
Co-authored-by: Peter Wilson <peter@mozilla.ai>
* build(model_prices_and_context_window.json): add fireworks ai new 0-4b pricing tier
* build(model_prices_and_context_window.json): add more fireworks ai models
* test: update testing
* fix(caching_handler.py): handle str + list cache
Fixes issue on cache hits for embedding when initial cached input was str
* test(test_caching.py): add e2e test on caching with individual item and then list
* fix(caching_handler.py): set usage tokens for cache hits
enables token counting to work
* fix(caching_handler.py): combine usage between cached result and embedding response
Handles case of new input to embedding response
* fix: cleanup
* test: move to gpt-4o-new-test
* test: update test
* fix(litellm_proxy/chat/transformation.py): support 'thinking' param
Fixes https://github.com/BerriAI/litellm/issues/9380
* feat(azure/gpt_transformation.py): add azure audio model support
Closes https://github.com/BerriAI/litellm/issues/6305
* fix(utils.py): use provider_config in common functions
* fix(utils.py): add missing provider configs to get_chat_provider_config
* test: fix test
* fix: fix path
* feat(utils.py): make bedrock invoke nova config baseconfig compatible
* fix: fix linting errors
* fix(azure_ai/transformation.py): remove buggy optional param filtering for azure ai
Removes incorrect check for support tool choice when calling azure ai - prevented calling models with response_format unless on litell model cost map
* fix(amazon_cohere_transformation.py): fix bedrock invoke cohere transformation to inherit from coherechatconfig
* test: fix azure ai tool choice mapping
* fix: fix model cost map to add 'supports_tool_choice' to cohere models
* fix(get_supported_openai_params.py): check if custom llm provider in llm providers
* fix(get_supported_openai_params.py): fix llm provider in list check
* fix: fix ruff check errors
* fix: support defs when calling bedrock nova
* fix(factory.py): fix test
* test: move test to just checking async
* fix(transformation.py): handle function call with no schema
* fix(utils.py): handle pydantic base model in message tool calls
Fix https://github.com/BerriAI/litellm/issues/9321
* fix(vertex_and_google_ai_studio.py): handle tools=[]
Fixes https://github.com/BerriAI/litellm/issues/9080
* test: remove max token restriction
* test: fix basic test
* fix(get_supported_openai_params.py): fix check
* fix(converse_transformation.py): support fake streaming for meta.llama3-3-70b-instruct-v1:0
* fix: fix test
* fix: parse out empty dictionary on dbrx streaming + tool calls
* fix(handle-'strict'-param-when-calling-fireworks-ai): fireworks ai does not support 'strict' param
* fix: fix ruff check
'
* fix: handle no strict in function
* fix: revert bedrock change - handle in separate PR
* fix running litellm on windows
* fix importing litellm
* _init_hypercorn_server
* linting fix
* TestProxyInitializationHelpers
* ci/cd run again
* ci/cd run again
* Adding VertexAI Claude 3.7 Sonnet (#8774)
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
* build(model_prices_and_context_window.json): add anthropic 3-7 models on vertex ai and bedrock
* Support video_url (#8743)
* Support video_url
Support VLMs that works with video.
Example implemenation in vllm: https://github.com/vllm-project/vllm/pull/10020
* llms openai.py: Add ChatCompletionVideoObject
Add data structures to support `video_url` in chat completion
* test test_completion.py: add test for video_url
* Arize Phoenix - ensure correct endpoint/protocol are used; and default to phoenix cloud (#8750)
* minor fixes to default to http and to ensure that the correct endpoint is used
* Update test_arize_phoenix.py
* prioritize http over grpc
---------
Co-authored-by: Emerson Gomes <emerson.gomes@gmail.com>
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Pang Wu <104795337+pang-wu@users.noreply.github.com>
Co-authored-by: Nate Mar <67926244+nate-mar@users.noreply.github.com>
* fix(team_endpoints.py): cleanup user <-> team association on team delete
Fixes issue where user table still listed team membership post delete
* test(test_team.py): update e2e test - ensure user/team membership is deleted on team delete
* fix(base_invoke_transformation.py): fix deepseek r1 transformation
remove deepseek name from model url
* test(test_completion.py): assert model route not in url
* feat(base_invoke_transformation.py): infer region name from model arn
prevent errors due to different region name in env var vs. model arn, respect if explicitly set in call though
* test: fix test
* test: skip on internal server error
* fix(azure/chat/gpt_transformation.py): add 'prediction' as a support azure param
Closes https://github.com/BerriAI/litellm/issues/8500
* build(model_prices_and_context_window.json): add new 'gemini-2.0-pro-exp-02-05' model
* style: cleanup invalid json trailing commma
* feat(utils.py): support passing 'tokenizer_config' to register_prompt_template
enables passing complete tokenizer config of model to litellm
Allows calling deepseek on bedrock with the correct prompt template
* fix(utils.py): fix register_prompt_template for custom model names
* test(test_prompt_factory.py): fix test
* test(test_completion.py): add e2e test for bedrock invoke deepseek ft model
* feat(base_invoke_transformation.py): support hf_model_name param for bedrock invoke calls
enables proxy admin to set base model for ft bedrock deepseek model
* feat(bedrock/invoke): support deepseek_r1 route for bedrock
makes it easy to apply the right chat template to that call
* feat(constants.py): store deepseek r1 chat template - allow user to get correct response from deepseek r1 without extra work
* test(test_completion.py): add e2e mock test for bedrock deepseek
* docs(bedrock.md): document new deepseek_r1 route for bedrock
allows us to use the right config
* fix(exception_mapping_utils.py): catch read operation timeout
* fix(router.py): add more deployment timeout debug information for timeout errors
help understand why some calls in high-traffic don't respect their model-specific timeouts
* test(test_convert_dict_to_response.py): unit test ensuring empty str is not converted to None
Addresses https://github.com/BerriAI/litellm/issues/8507
* fix(convert_dict_to_response.py): handle empty message str - don't return back as 'None'
Fixes https://github.com/BerriAI/litellm/issues/8507
* test(test_completion.py): add e2e test
* test(base_llm_unit_tests.py): add test to ensure drop params is respected
* fix(types/prometheus.py): use typing_extensions for python3.8 compatibility
* build: add cherry picked commits
* fix handleSubmit
* update handleAddModelSubmit
* add jest testing for ui
* add step for running ui unit tests
* add validate json step to add model
* ui jest testing fixes
* update package lock
* ci/cd run again
* fix antd import
* run jest tests first
* fix antd install
* fix ui unit tests
* fix unit test ui
* docs(reliability.md): add doc on disabling fallbacks per request
* feat(litellm_pre_call_utils.py): support reading request timeout from request headers - new `x-litellm-timeout` param
Allows setting dynamic model timeouts from vercel's AI sdk
* test(test_proxy_server.py): add simple unit test for reading request timeout
* test(test_fallbacks.py): add e2e test to confirm timeout passed in request headers is correctly read
* feat(main.py): support passing metadata to openai in preview
Resolves https://github.com/BerriAI/litellm/issues/6022#issuecomment-2616119371
* fix(main.py): fix passing openai metadata
* docs(request_headers.md): document new request headers
* build: Merge branch 'main' into litellm_dev_01_27_2025_p3
* test: loosen test
* feat(main.py): use asyncio.sleep for mock_Timeout=true on async request
adds unit testing to ensure proxy does not fail if specific Openai requests hang (e.g. recent o1 outage)
* fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming
Fixes https://github.com/BerriAI/litellm/issues/7942
* Revert "fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming"
This reverts commit 7a052a64e3642616405e71350627e2e4f66615b4.
* fix(deepseek-r-1): return reasoning_content as a top-level param
ensures compatibility with existing tools that use it
* fix: fix linting error
* feat(main.py): add new 'provider_specific_header' param
allows passing extra header for specific provider
* fix(litellm_pre_call_utils.py): add unit test for pre call utils
* test(test_bedrock_completion.py): skip test now that bedrock supports this
* fix(types/utils.py): support returning 'reasoning_content' for deepseek models
Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218
* fix(convert_dict_to_response.py): return deepseek response in provider_specific_field
allows for separating openai vs. non-openai params in model response
* fix(utils.py): support 'provider_specific_field' in delta chunk as well
allows deepseek reasoning content chunk to be returned to user from stream as well
Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218
* fix(watsonx/chat/handler.py): fix passing space id to watsonx on chat route
* fix(watsonx/): fix watsonx_text/ route with space id
* fix(watsonx/): qa item - also adds better unit testing for watsonx embedding calls
* fix(utils.py): rename to '..fields'
* fix: fix linting errors
* fix(utils.py): fix typing - don't show provider-specific field if none or empty - prevents default respons
e from being non-oai compatible
* fix: cleanup unused imports
* docs(deepseek.md): add docs for deepseek reasoning model
* fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811
* fix(router.py): fix mock timeout check
* fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806)
This error happens if you provide multiple fallback models to the completion function with model name defined in each one.
* fix(router.py): remove mock_timeout before sending to request
prevents reuse in fallbacks
* test: update test
* test: revert test change - wrong pr
---------
Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>