Commit Graph

95 Commits

Author SHA1 Message Date
Ishaan Jaff 642cfa26b0 remove deprecated 2025-07-22 20:59:34 -07:00
Ishaan Jaff bf300f8ca7 Revert "Litellm dev 07 21 2025 p1 (#12848)"
This reverts commit e4e10aa4ed.
2025-07-22 18:28:36 -07:00
Krish Dholakia e4e10aa4ed Litellm dev 07 21 2025 p1 (#12848)
* fix(main.py): fix async retryer

Fixes https://github.com/BerriAI/litellm/issues/12830

* fix(forward_clientside_headers_by_model_group.py): filter out 'content-type' from forwardable headers

clientside content-type != proxy content type, can cause requests to hang

* test(tests/): update tests
2025-07-21 22:09:39 -07:00
Krish Dholakia c0319d0d01 Litellm dev fix gemini web search tracking (#12288)
* feat(stream_chunk_builder_utils.py): correctly return web_search_requests on stream chunk builder

* fix(types/utils.py): handle prompttokendetails

* fix(stream_chunk_builder_utils.py): fix ruff check error

* test: try-except rate limit error

* fix: fix import
2025-07-03 12:27:14 -07:00
Krrish Dholakia a198d4a39f test: change mistral model
service tier exceeded
2025-07-02 21:11:02 -07:00
Ishaan Jaff bc835c6044 test_lm_studio_completion 2025-06-06 20:41:00 -07:00
Krrish Dholakia 4e3c8ae94f test: update test due to cohere ssl issues 2025-05-19 20:07:57 -07:00
Krish Dholakia d37cc63250 Add new model provider Novita AI (#7582) (#9527)
* Add new model provider Novita AI (#7582)

* feat: add new model provider Novita AI

* feat: use deepseek r1 model for examples in Novita AI docs

* fix: fix tests

* fix: fix tests for novita

* fix: fix novita transformation

* ci: fix ci yaml

* fix: fix novita transformation and test (#10056)

---------

Co-authored-by: Jason <ggbbddjm@gmail.com>
2025-05-12 21:49:30 -07:00
Ishaan Jaff 88f5f9b7f8 fix ai21 test 2025-05-07 21:45:57 -07:00
Ishaan Jaff 580e221000 fix ai21 test 2025-05-07 21:26:35 -07:00
Ishaan Jaff de7870cb54 Add llamafile as a provider (#10203) (#10482)
* Update docs for OpenAI compatible providers, add Llamafile docs, include Llamafile in the sidebar

* Add Llamafile as an LlmProviders enum

* Add llamafile as a OpenAI compatible provider (in the list of compatible providers)

* Add Llamafile chat config and tests

* Wire up Llamafile

Co-authored-by: Peter Wilson <peter@mozilla.ai>
2025-05-01 18:36:55 -07:00
Krrish Dholakia 4ab0ee0b65 test: more testing fixes 2025-05-01 15:36:13 -07:00
Krish Dholakia 9e35ca2010 Embedding caching fixes - handle str -> list cache, set usage tokens for cache hits, combine usage tokens on partial cache hits (#10424)
* build(model_prices_and_context_window.json): add fireworks ai new 0-4b pricing tier

* build(model_prices_and_context_window.json): add more fireworks ai models

* test: update testing

* fix(caching_handler.py): handle str + list cache

Fixes issue on cache hits for embedding when initial cached input was str

* test(test_caching.py): add e2e test on caching with individual item and then list

* fix(caching_handler.py): set usage tokens for cache hits

enables token counting to work

* fix(caching_handler.py): combine usage between cached result and embedding response

Handles case of new input to embedding response

* fix: cleanup

* test: move to gpt-4o-new-test

* test: update test
2025-04-29 21:21:28 -07:00
Krish Dholakia d783190e04 Update fireworks ai pricing (#10425)
* build(model_prices_and_context_window.json): add fireworks ai new 0-4b pricing tier

* build(model_prices_and_context_window.json): add more fireworks ai models

* test: update testing

* test: testing updates

* test: update test

* test: update test
2025-04-29 20:58:05 -07:00
Ishaan Jaff b9756bf006 test_completion_azure 2025-04-19 07:24:11 -07:00
Krish Dholakia 1ea046cc61 test: update tests to new deployment model (#10142)
* test: update tests to new deployment model

* test: update model name

* test: skip cohere rbac issue test

* test: update test - replace gpt-4o model
2025-04-18 14:22:12 -07:00
Krrish Dholakia 415abfc222 test: update test 2025-04-18 13:13:58 -07:00
Krrish Dholakia f7dd688035 test: handle cohere rbac issue (verified happens on calling azure directly) 2025-04-18 08:42:12 -07:00
Ishaan Jaff ad09d250ef test fix azure deprecated mistral 2025-04-15 22:32:14 -07:00
Ishaan Jaff b3f37b860d test fix azure deprecated mistral ai 2025-04-15 21:42:40 -07:00
Krish Dholakia ac9f03beae Allow passing thinking param to litellm proxy via client sdk + Code QA Refactor on get_optional_params (get correct values) (#9386)
* fix(litellm_proxy/chat/transformation.py): support 'thinking' param

Fixes https://github.com/BerriAI/litellm/issues/9380

* feat(azure/gpt_transformation.py): add azure audio model support

Closes https://github.com/BerriAI/litellm/issues/6305

* fix(utils.py): use provider_config in common functions

* fix(utils.py): add missing provider configs to get_chat_provider_config

* test: fix test

* fix: fix path

* feat(utils.py): make bedrock invoke nova config baseconfig compatible

* fix: fix linting errors

* fix(azure_ai/transformation.py): remove buggy optional param filtering for azure ai

Removes incorrect check for support tool choice when calling azure ai - prevented calling models with response_format unless on litell model cost map

* fix(amazon_cohere_transformation.py): fix bedrock invoke cohere transformation to inherit from coherechatconfig

* test: fix azure ai tool choice mapping

* fix: fix model cost map to add 'supports_tool_choice' to cohere models

* fix(get_supported_openai_params.py): check if custom llm provider in llm providers

* fix(get_supported_openai_params.py): fix llm provider in list check

* fix: fix ruff check errors

* fix: support defs when calling bedrock nova

* fix(factory.py): fix test
2025-04-07 21:04:11 -07:00
Krish Dholakia fcf17d114f Litellm dev 04 05 2025 p2 (#9774)
* test: move test to just checking async

* fix(transformation.py): handle function call with no schema

* fix(utils.py): handle pydantic base model in message tool calls

Fix https://github.com/BerriAI/litellm/issues/9321

* fix(vertex_and_google_ai_studio.py): handle tools=[]

Fixes https://github.com/BerriAI/litellm/issues/9080

* test: remove max token restriction

* test: fix basic test

* fix(get_supported_openai_params.py): fix check

* fix(converse_transformation.py): support fake streaming for meta.llama3-3-70b-instruct-v1:0

* fix: fix test

* fix: parse out empty dictionary on dbrx streaming + tool calls

* fix(handle-'strict'-param-when-calling-fireworks-ai): fireworks ai does not support 'strict' param

* fix: fix ruff check

'

* fix: handle no strict in function

* fix: revert bedrock change - handle in separate PR
2025-04-07 21:02:52 -07:00
Krish Dholakia 34bdf36eab Add inference providers support for Hugging Face (#8258) (#9738) (#9773)
* Add inference providers support for Hugging Face (#8258)

* add first version of inference providers for huggingface

* temporarily skipping tests

* Add documentation

* Fix titles

* remove max_retries from params and clean up

* add suggestions

* use llm http handler

* update doc

* add suggestions

* run formatters

* add tests

* revert

* revert

* rename file

* set maxsize for lru cache

* fix embeddings

* fix inference url

* fix tests following breaking change in main

* use ChatCompletionRequest

* fix tests and lint

* [Hugging Face] Remove outdated chat completion tests and fix embedding tests (#9749)

* remove or fix tests

* fix link in doc

* fix(config_settings.md): document hf api key

---------

Co-authored-by: célina <hanouticelina@gmail.com>
2025-04-05 10:50:15 -07:00
Ishaan Jaff e7a8b5a809 run ci/cd again 2025-03-26 08:12:51 -07:00
Ishaan Jaff c010cdef59 test_dynamic_azure_params 2025-03-18 17:26:23 -07:00
Krrish Dholakia e2ae504a81 test: skip flaky tests 2025-03-11 19:43:04 -07:00
Krish Dholakia f899b828cf Support openrouter reasoning_content on streaming (#9094)
* feat(convert_dict_to_response.py): support openrouter format of reasoning content

* fix(transformation.py): fix openrouter streaming with reasoning content

Fixes https://github.com/BerriAI/litellm/issues/8193#issuecomment-270892962

* fix: fix type error
2025-03-09 20:03:59 -07:00
Ishaan Jaff f9cee4c46b (Bug Fix) Using LiteLLM Python SDK with model=litellm_proxy/ for embedding, image_generation, transcription, speech, rerank (#8815)
* test_litellm_gateway_from_sdk

* fix embedding check for openai

* test litellm proxy provider

* fix image generation openai compatible models

* fix litellm.transcription

* test_litellm_gateway_from_sdk_rerank

* docs litellm python sdk

* docs litellm python sdk with proxy

* test_litellm_gateway_from_sdk_rerank

* ci/cd run again

* test_litellm_gateway_from_sdk_image_generation

* test_litellm_gateway_from_sdk_embedding

* test_litellm_gateway_from_sdk_embedding
2025-02-25 16:22:37 -08:00
Krish Dholakia b829475587 Litellm dev 02 25 2025 p1 (#8816)
* build(model_prices_and_context_window.json): add bedrock cross-region inferencing model information

Closes https://github.com/BerriAI/litellm/issues/8801#issuecomment-2683438528

* build(model_prices_and_context_window.json): add claude sonnet `-latest` models to model cost map

Closes https://github.com/BerriAI/litellm/discussions/8770#discussioncomment-12318880

* build(model_prices_and_context_window.json): add remaining anthropic `-latest` models to model cost map

Closes https://github.com/BerriAI/litellm/discussions/8770#discussioncomment-12318880

* test: update test with new model
2025-02-25 15:20:39 -08:00
Ishaan Jaff d963568970 (Bug fix) - running litellm proxy on wndows (#8735)
* fix running litellm on windows

* fix importing litellm

* _init_hypercorn_server

* linting fix

* TestProxyInitializationHelpers

* ci/cd run again

* ci/cd run again
2025-02-25 15:19:19 -08:00
Krish Dholakia 9914c166b7 Litellm contributor prs 02 24 2025 (#8775)
* Adding VertexAI Claude 3.7 Sonnet (#8774)

Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>

* build(model_prices_and_context_window.json): add anthropic 3-7 models on vertex ai and bedrock

* Support video_url (#8743)

* Support video_url

Support VLMs that works with video.
Example implemenation in vllm: https://github.com/vllm-project/vllm/pull/10020

* llms openai.py: Add ChatCompletionVideoObject

Add data structures to support `video_url` in chat completion

* test test_completion.py: add test for video_url

* Arize Phoenix - ensure correct endpoint/protocol are used; and default to phoenix cloud (#8750)

* minor fixes to default to http and to ensure that the correct endpoint is used

* Update test_arize_phoenix.py

* prioritize http over grpc

---------

Co-authored-by: Emerson Gomes <emerson.gomes@gmail.com>
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Pang Wu <104795337+pang-wu@users.noreply.github.com>
Co-authored-by: Nate Mar <67926244+nate-mar@users.noreply.github.com>
2025-02-24 18:55:48 -08:00
Krish Dholakia 2b7755f8d8 Litellm dev 02 18 2025 p3 (#8640)
* fix(team_endpoints.py): cleanup user <-> team association on team delete

Fixes issue where user table still listed team membership post delete

* test(test_team.py): update e2e test - ensure user/team membership is deleted on team delete

* fix(base_invoke_transformation.py): fix deepseek r1 transformation

remove deepseek name from model url

* test(test_completion.py): assert model route not in url

* feat(base_invoke_transformation.py): infer region name from model arn

prevent errors due to different region name in env var vs. model arn, respect if explicitly set in call though

* test: fix test

* test: skip on internal server error
2025-02-18 19:14:20 -08:00
Krish Dholakia 58141df65d Litellm dev 02 13 2025 p2 (#8525)
* fix(azure/chat/gpt_transformation.py): add 'prediction' as a support azure param

Closes https://github.com/BerriAI/litellm/issues/8500

* build(model_prices_and_context_window.json): add new 'gemini-2.0-pro-exp-02-05' model

* style: cleanup invalid json trailing commma

* feat(utils.py): support passing 'tokenizer_config' to register_prompt_template

enables passing complete tokenizer config of model to litellm

 Allows calling deepseek on bedrock with the correct prompt template

* fix(utils.py): fix register_prompt_template for custom model names

* test(test_prompt_factory.py): fix test

* test(test_completion.py): add e2e test for bedrock invoke deepseek ft model

* feat(base_invoke_transformation.py): support hf_model_name param for bedrock invoke calls

enables proxy admin to set base model for ft bedrock deepseek model

* feat(bedrock/invoke): support deepseek_r1 route for bedrock

makes it easy to apply the right chat template to that call

* feat(constants.py): store deepseek r1 chat template - allow user to get correct response from deepseek r1 without extra work

* test(test_completion.py): add e2e mock test for bedrock deepseek

* docs(bedrock.md): document new deepseek_r1 route for bedrock

allows us to use the right config

* fix(exception_mapping_utils.py): catch read operation timeout
2025-02-13 20:28:42 -08:00
Krish Dholakia f5841eb84d fix(router.py): add more deployment timeout debug information for tim… (#8523)
* fix(router.py): add more deployment timeout debug information for timeout errors

help understand why some calls in high-traffic don't respect their model-specific timeouts

* test(test_convert_dict_to_response.py): unit test ensuring empty str is not converted to None

Addresses https://github.com/BerriAI/litellm/issues/8507

* fix(convert_dict_to_response.py): handle empty message str - don't return back as 'None'

Fixes https://github.com/BerriAI/litellm/issues/8507

* test(test_completion.py): add e2e test
2025-02-13 17:10:22 -08:00
Krish Dholakia c8494abdea test(base_llm_unit_tests.py): add test to ensure drop params is respe… (#8224)
* test(base_llm_unit_tests.py): add test to ensure drop params is respected

* fix(types/prometheus.py): use typing_extensions for python3.8 compatibility

* build: add cherry picked commits
2025-02-03 16:04:44 -08:00
Krish Dholakia e4566d7b1c fix(main.py): fix passing openrouter specific params (#8184)
* fix(main.py): fix passing openrouter specific params

Fixes https://github.com/BerriAI/litellm/issues/8130

* test(test_get_model_info.py): add check for region name w/ cris model

Resolves https://github.com/BerriAI/litellm/issues/8115
2025-02-02 22:23:14 -08:00
Ishaan Jaff 4005a51db2 (UI) fix adding Vertex Models (#8129)
* fix handleSubmit

* update handleAddModelSubmit

* add jest testing for ui

* add step for running ui unit tests

* add validate json step to add model

* ui jest testing fixes

* update package lock

* ci/cd run again

* fix antd import

* run jest tests first

* fix antd install

* fix ui unit tests

* fix unit test ui
2025-01-30 21:11:08 -08:00
Krish Dholakia d9eb8f42ff Litellm dev 01 27 2025 p3 (#8047)
* docs(reliability.md): add doc on disabling fallbacks per request

* feat(litellm_pre_call_utils.py): support reading request timeout from request headers - new `x-litellm-timeout` param

Allows setting dynamic model timeouts from vercel's AI sdk

* test(test_proxy_server.py): add simple unit test for reading request timeout

* test(test_fallbacks.py): add e2e test to confirm timeout passed in request headers is correctly read

* feat(main.py): support passing metadata to openai in preview

Resolves https://github.com/BerriAI/litellm/issues/6022#issuecomment-2616119371

* fix(main.py): fix passing openai metadata

* docs(request_headers.md): document new request headers

* build: Merge branch 'main' into litellm_dev_01_27_2025_p3

* test: loosen test
2025-01-28 18:01:27 -08:00
Ishaan Jaff 46469c6087 set timeout for deepseek testing 2025-01-27 21:25:28 -08:00
Krish Dholakia 6bafdbc546 Litellm dev 01 25 2025 p4 (#8006)
* feat(main.py): use asyncio.sleep for mock_Timeout=true on async request

adds unit testing to ensure proxy does not fail if specific Openai requests hang (e.g. recent o1 outage)

* fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming

Fixes https://github.com/BerriAI/litellm/issues/7942

* Revert "fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming"

This reverts commit 7a052a64e3642616405e71350627e2e4f66615b4.

* fix(deepseek-r-1): return reasoning_content as a top-level param

ensures compatibility with existing tools that use it

* fix: fix linting error
2025-01-26 08:01:05 -08:00
Krish Dholakia 27560bd5ad Litellm dev 01 22 2025 p4 (#7932)
* feat(main.py): add new 'provider_specific_header' param

allows passing extra header for specific provider

* fix(litellm_pre_call_utils.py): add unit test for pre call utils

* test(test_bedrock_completion.py): skip test now that bedrock supports this
2025-01-22 21:52:07 -08:00
Krish Dholakia 76795dba39 Deepseek r1 support + watsonx qa improvements (#7907)
* fix(types/utils.py): support returning 'reasoning_content' for deepseek models

Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218

* fix(convert_dict_to_response.py): return deepseek response in provider_specific_field

allows for separating openai vs. non-openai params in model response

* fix(utils.py): support 'provider_specific_field' in delta chunk as well

allows deepseek reasoning content chunk to be returned to user from stream as well

Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218

* fix(watsonx/chat/handler.py): fix passing space id to watsonx on chat route

* fix(watsonx/): fix watsonx_text/ route with space id

* fix(watsonx/): qa item - also adds better unit testing for watsonx embedding calls

* fix(utils.py): rename to '..fields'

* fix: fix linting errors

* fix(utils.py): fix typing - don't show provider-specific field if none or empty - prevents default respons
e from being non-oai compatible

* fix: cleanup unused imports

* docs(deepseek.md): add docs for deepseek reasoning model
2025-01-21 23:13:15 -08:00
Krish Dholakia 1bea338597 LiteLLM Minor Fixes & Improvements (2024/16/01) (#7826)
* fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811

* fix(router.py): fix mock timeout check

* fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806)

This error happens if you provide multiple fallback models to the completion function with model name defined in each one.

* fix(router.py): remove mock_timeout before sending to request

prevents reuse in fallbacks

* test: update test

* test: revert test change - wrong pr

---------

Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>
2025-01-17 20:59:21 -08:00
Ishaan Jaff b30e05b54f Revert "test_completion_mistral_api_mistral_large_function_call"
This reverts commit ef9177f0a8.
2025-01-17 07:20:46 -08:00
Ishaan Jaff 7f63e7c15a test_completion_mistral_api_mistral_large_function_call 2025-01-16 22:27:48 -08:00
Ishaan Jaff ef9177f0a8 test_completion_mistral_api_mistral_large_function_call 2025-01-16 21:50:56 -08:00
Ishaan Jaff 2507c275f6 (proxy perf improvement) - use uvloop for higher RPS (10%-20% higher RPS) (#7662)
* uvicorn use uvloop

* fix uvloop==0.21.0

* add uvloop to pyproject

* test_completion_response_ratelimit_headers
2025-01-09 18:11:20 -08:00
Ishaan Jaff 55139b8fd6 update tests 2025-01-06 22:36:00 -08:00
Ishaan Jaff 744beac754 ci/cd run again 2025-01-06 21:35:34 -08:00
Ishaan Jaff 716efd5fad (fix proxy perf) use _read_request_body instead of ast.literal_eval to get better performance (#7545)
* fix ast literal eval

* run ci/cd again
2025-01-03 17:48:32 -08:00