Commit Graph

19382 Commits

Author SHA1 Message Date
Krish Dholakia b5850b6b65 Handle azure deepseek reasoning response (#8288) (#8366)
* Handle azure deepseek reasoning response (#8288)

* Handle deepseek reasoning response

* Add helper method + unit test

* Fix: Follow infinity api url format (#8346)

* Follow infinity api url format

* Update test_infinity.py

* fix(infinity/transformation.py): fix linting error

---------

Co-authored-by: vibhavbhat <vibhavb00@gmail.com>
Co-authored-by: Hao Shan <53949959+haoshan98@users.noreply.github.com>
2025-02-07 17:45:51 -08:00
Krish Dholakia f651d51f26 Litellm dev 02 07 2025 p2 (#8377)
* fix(caching_routes.py): mask redis password on `/cache/ping` route

* fix(caching_routes.py): fix linting erro

* fix(caching_routes.py): fix linting error on caching routes

* fix: fix test - ignore mask_dict - has a breakpoint

* fix(azure.py): add timeout param + elapsed time in azure timeout error

* fix(http_handler.py): add elapsed time to http timeout request

makes it easier to debug how long request took before failing
2025-02-07 17:30:38 -08:00
Byron Grogan 5a42be43e0 fix: add azure/o1-2024-12-17 to model_prices_and_context_window.json (#8371) 2025-02-07 16:22:33 -08:00
Krish Dholakia dfbbf0bde8 fix: dictionary changed size during iteration error (#8327) (#8341)
Co-authored-by: Joey Feldberg <joeyfeldberg@users.noreply.github.com>
Co-authored-by: Joey Feldberg <12495578+joeyfeldberg@users.noreply.github.com>
2025-02-07 16:20:28 -08:00
Krish Dholakia 5d170162d3 fix(nvidia_nim/embed.py): add 'dimensions' support (#8302)
* fix(nvidia_nim/embed.py): add 'dimensions' support

Fixes https://github.com/BerriAI/litellm/issues/8238

* fix(proxy_Server.py): initialize router redis cache if setup on proxy

Fixes https://github.com/BerriAI/litellm/issues/6602

* test: add unit testing for new helper function
2025-02-07 16:19:32 -08:00
Krrish Dholakia 16be203283 build(pyproject.toml): bump version 2025-02-07 09:25:58 -08:00
Nikolaiev Dmytro 346d8a9132 Update deepseek API prices for 2025-02-08 (#8363) 2025-02-07 08:25:35 -08:00
Krrish Dholakia c4cfd5eb1f build(ui): updates 2025-02-06 23:25:09 -08:00
Krrish Dholakia 790c6eb02a bump: version 1.60.6 → 1.60.7 2025-02-06 23:24:38 -08:00
Krrish Dholakia 9f426a6b1a build(ui/): update ui build 2025-02-06 23:24:25 -08:00
Krish Dholakia 6b8b49451f Fix azure max retries error (#8340)
* fix(azure.py): ensure max_retries=0 is respected

Fixes https://github.com/BerriAI/litellm/issues/6129

* fix(test_openai.py): add unit test to ensure openai sdk calls always respect max_retries = 0

* test(test_azure_openai.py): add unit testing for azure_text/ route

* fix(azure.py): fix passing max retries on streaming

* fix(azure.py): fix azure max retries on async completion + streaming

* fix(completion/handler.py): fix azure text async completion + streaming

* test(test_azure_openai.py): ensure azure openai max retries always respected

* test(test_azure_o_series.py): add testing to ensure max retries always respected

* Added gemini providers for 2.0-flash and 2.0-flash lite (#8321)

* Update model_prices_and_context_window.json

added gemini providers for 2.0-flash and 2.0-flash light

* Update model_prices_and_context_window.json

fixed URL

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* Convert tool use arguments to string before counting tokens (#6989)

In at least some cases the `messages["tool_calls"]["function"]["arguments"]` is a dict, not a string. In order to tokenize it properly it needs to be a string. In the case that it is already a string this is a noop, which is also fine.

* build(model_prices_and_context_window.json): add gemini 2.0 flash lite pricing

* build(model_prices_and_context_window.json): add gemini commercial rate limits

* fix(utils.py): fix linting error

* refactor(utils.py): refactor to maintain function size

---------

Co-authored-by: Bardia Khosravi <bardiakhosravi95@gmail.com>
Co-authored-by: Josh Morrow <josh@jcmorrow.com>
2025-02-06 23:20:48 -08:00
Krish Dholakia d720744656 Litellm dev 02 06 2025 p3 (#8343)
* feat(handle_jwt.py): initial commit to allow scope based model access

* feat(handle_jwt.py): allow model access based on token scopes

allow admin to control model access from IDP

* test(test_jwt.py): add unit testing for scope based model access

* docs(token_auth.md): add scope based model access to docs

* docs(token_auth.md): update docs

* docs(token_auth.md): update docs

* build: add gemini commercial rate limits

* fix: fix linting error
2025-02-06 23:15:33 -08:00
Krish Dholakia f87ab251b0 UI Updates (#8345)
* fix(.globals.css): revert .md hard set

caused regression in invitation link display (and possibly other places)

* Fix keys not showing on refresh for internal users  (#8312)

* [Bug] UI: Newly created key does not display on the View Key Page (#8039)

- Fixed issue where all keys appeared blank for admin users.
- Implemented filtering of data via team settings to ensure all keys are displayed correctly.

* Fix:
- Updated the validator to allow model editing when `keyTeam.team_alias === "Default Team"`.
- Ensured other teams still follow the original validation rules.

* - added some classes in global.css
- added text wrap in output of request,response and metadata in index.tsx
- fixed styles of table in table.tsx

* - added full payload when we open single log entry
- added Combined Info Card in index.tsx

* fix: keys not showing on refresh for internal user

* fixed user id passed as null when keyuser is you (#8271)

* fix(user_dashboard.tsx): ensure non admin can't view other keys

---------

Co-authored-by: Taha Ali <123803932+tahaali-dev@users.noreply.github.com>
Co-authored-by: Jaswanth Karani <karani.jaswanth@gmail.com>
2025-02-06 22:41:20 -08:00
Ishaan Jaff e3aab50ab3 docs assembly ai 2025-02-06 21:30:36 -08:00
Ishaan Jaff 7739be340b fix assembly pass through cost tracking v1.60.6 2025-02-06 21:20:59 -08:00
Ishaan Jaff 229f270dd6 docs assembly ai eu endpoints 2025-02-06 21:13:40 -08:00
Ishaan Jaff ab761b9dc8 bump: version 1.60.5 → 1.60.6 2025-02-06 21:06:07 -08:00
Ishaan Jaff 778bbcdd9c fix test_get_model_info_gemini 2025-02-06 21:05:47 -08:00
Ishaan Jaff 7706ff1f1e ui new build 2025-02-06 18:31:21 -08:00
Ishaan Jaff 65c91cbbbc (QA+UI) - e2e flow for adding assembly ai passthrough endpoints (#8337)
* add initial test for assembly ai

* start using PassthroughEndpointRouter

* migrate to lllm passthrough endpoints

* add assembly ai as a known provider

* fix PassthroughEndpointRouter

* fix set_pass_through_credentials

* working EU request to assembly ai pass through endpoint

* add e2e test assembly

* test_assemblyai_routes_with_bad_api_key

* clean up pass through endpoint router

* e2e testing for assembly ai pass through

* test assembly ai e2e testing

* delete assembly ai models

* fix code quality

* ui working assembly ai api base flow

* fix install assembly ai

* update model call details with kwargs for pass through logging

* fix tracking assembly ai model in response

* _handle_assemblyai_passthrough_logging

* fix test_initialize_deployment_for_pass_through_unsupported_provider

* TestPassthroughEndpointRouter

* _get_assembly_transcript

* fix assembly ai pt logging tests

* fix assemblyai_proxy_route

* fix _get_assembly_region_from_url
2025-02-06 18:27:54 -08:00
Ishaan Jaff 5dcb87a88b (bug fix router.py) - safely handle choices=[] on llm responses (#8342)
* test fix test_router_with_empty_choices

* fix _should_raise_content_policy_error
2025-02-06 18:22:08 -08:00
Ishaan Jaff d2fec8bf13 databricks/meta-llama-3.3-70b-instruct 2025-02-06 18:21:56 -08:00
Krish Dholakia f031926b82 fix(utils.py): handle key error in msg validation (#8325)
* fix(utils.py): handle key error in msg validation

* Support running Aim Guard during LLM call (#7918)

* support running Aim Guard during LLM call

* Rename header

* adjust docs and fix type annotations

* fix(timeout.md): doc fix for openai example on dynamic timeouts

---------

Co-authored-by: Tomer Bin <117278227+hxtomer@users.noreply.github.com>
2025-02-06 18:13:46 -08:00
Anton Abilov fac1d2ccef Fixed meta llama 3.3 key for Databricks API (#8093)
See correct key reference here: https://docs.databricks.com/en/machine-learning/model-serving/foundation-model-overview.html#pay-per-token
2025-02-06 18:05:49 -08:00
Ishaan Jaff b535c9bdc0 (Bug Fix - Langfuse) - fix for when model response has choices=[] (#8339)
* refactor _get_langfuse_input_output_content

* test_langfuse_logging_completion_with_malformed_llm_response

* fix _get_langfuse_input_output_content

* fixes for langfuse linting

* unit testing for get chat/text content for langfuse

* fix _should_raise_content_policy_error
2025-02-06 18:02:26 -08:00
Rok Benko 3ec9c28fb7 Update local_debugging.md (#8308) 2025-02-06 16:19:32 -08:00
Wanis Elabbar 15ac5f3c32 Fix pricing for Gemini 2.0 Flash 001 (#8320)
Model 	Type 	Price 	Price with Batch API
Gemini 2.0 Flash
1M Input tokens 	$0.15 	$0.075
1M Input audio tokens 	$1.00 	$0.50
1M Output text tokens 	$0.60 	$0.30

https://cloud.google.com/vertex-ai/generative-ai/pricing#token-based-pricing
2025-02-06 16:17:29 -08:00
Luis Sanchez 1b4f0f7192 Add aistudio GEMINI 2.0 to model_prices_and_context_window.json (#8335) 2025-02-06 16:16:54 -08:00
exiao 85491a0bab Add Arize Cookbook for Turning on LiteLLM Proxy (#8336)
* Add files via upload

* Update arize_integration.md
2025-02-06 16:16:28 -08:00
Krish Dholakia bcfa641b81 Add gemini-2.0-flash pricing + model info (#8303)
* add gemini-2.0-flash-001 (#8289)

* build(model_prices_and_context_window.json): add gemini-2.0-flash-001 to model cost map

Adds new gemini model with token based pricing to model cost map

---------

Co-authored-by: kushagro <kush@orby.ai>
2025-02-05 20:49:26 -08:00
Tyler Wagner 5e921804b9 fix: docs links (#8294)
Fixed the docs links in the enterprise md.
2025-02-05 20:41:20 -08:00
Krish Dholakia b4e5c0de69 Improve rpm check on keys (#8301)
* fix(parallel_request_limiter.py): initial commit that solves the rpm limit check on keys

Fixes https://github.com/BerriAI/litellm/issues/6938

* fix(parallel_request_limiter.py): simpler approach - just increment RPM in pre call hook instead of on success

* fix(parallel_request_limiter.py): pass testing

* fix: fix linting error

* fix(parallel_request_limiter.py): fix parallel request check for keys
2025-02-05 20:23:08 -08:00
Krish Dholakia 7e873538f6 Fix edit team on ui (#8295)
* fix(columns.tsx): fix request logs team column to indicate the value is the alias not the id

* fix(team_info.tsx): add edit team logic to team info page

* fix(team_info.tsx): re-enable updating team settings on UI

Fixes https://github.com/BerriAI/litellm/issues/8281

* fix(team_info.tsx): fix save changes on team update

* fix(teams.tsx): allow edit button to still act as a quick action button -> drop user into settings page for team

* test(config.yml): run dev ui during testing

make sure no ui regressions are pushed on main

* build: update ci/cd

* ci(config.yml): fix test

* ci: fix ci

* ci: update

* ci: fix

* ci: another attempt to get nvm working in ci/cd

* ci: fix ci

* ci: test update

* ci: test update 2

* ci: test 3

* fix(team_info.tsx): fix linting error
2025-02-05 20:13:17 -08:00
Krish Dholakia 443ae55904 Azure OpenAI improvements - o3 native streaming, improved tool call + response format handling (#8292)
* fix(convert_dict_to_response.py): only convert if response is the response_format tool call passed in

Fixes https://github.com/BerriAI/litellm/issues/8241

* fix(gpt_transformation.py): makes sure response format / tools conversion doesn't remove previous tool calls

* refactor(gpt_transformation.py): refactor out json schema converstion to base config

keeps logic consistent across providers

* fix(o_series_transformation.py): support o3 mini native streaming

Fixes https://github.com/BerriAI/litellm/issues/8274

* fix(gpt_transformation.py): remove unused variables

* test: update test
2025-02-05 19:38:58 -08:00
Ishaan Jaff 515598114c bump: version 1.60.4 → 1.60.5 v1.60.5 2025-02-05 19:02:45 -08:00
Ishaan Jaff 03f738eff6 fix test_models_by_provider 2025-02-05 19:01:00 -08:00
Ishaan Jaff 818792228c (Refactor) - migrate bedrock invoke to BaseLLMHTTPHandler class (#8290)
* initial transform for invoke

* invoke transform_response

* working - able to make request

* working get_complete_url

* working - invoke now runs on llm_http_handler

* fix unused imports

* track litellm overhead ms

* working stream request

* sign_request transform

* sign_request update

* use has_async_custom_stream_wrapper property

* use get_async_custom_stream_wrapper in base llm http handler

* fix make_call in invoke handler

* fix invoke with streaming get_async_custom_stream_wrapper

* working bedrock async streaming with invoke

* fix make call handler for bedrock

* test_all_model_configs

* fix test_bedrock_custom_prompt_template

* sync streaming for bedrock invoke

* fix _add_stream_param_to_request_body

* test_async_text_completion_bedrock

* fix transform_request

* fix get_supported_openai_params

* fix test supports tool choice

* fix test_supports_tool_choice

* add unit test coverage for bedrock invoke transform

* fix location of transformation files

* update import loc

* fix bedrock invoke unit tests

* fix import for max completion tokens
2025-02-05 18:58:55 -08:00
Ishaan Jaff e41bc5f32b fixed issues #8126 and #8127 (#8275) (#8299)
Co-authored-by: Jaswanth Karani <karani.jaswanth@gmail.com>
2025-02-05 18:52:58 -08:00
Ishaan Jaff b76b380bc8 fix add back sambanova/Qwen2.5-72B-Instruct 2025-02-05 18:44:17 -08:00
Ishaan Jaff ffd890e744 add assembly ai cost tracking (#8298) 2025-02-05 18:43:37 -08:00
Ishaan Jaff e42fcf4d03 (UI) - Add Assembly AI provider to UI (#8297)
* add assembly ai to ui

* specify api base for assembly ai
2025-02-05 18:42:51 -08:00
Ishaan Jaff 6cef115bb0 (Security fix) - remove code block that inserts master key hash into DB (#8268)
* remove code block upserting master key hash to db

* run test to check if key upserted into db

* run ci/cd again

* litellm_proxy_security_tests

* litellm_proxy_security_tests

* run prisma entrypoint

* ci/cd run again

* fix test master key not in db
2025-02-05 17:25:42 -08:00
Zhaohan Dong 88e7046165 Added compatibility guidance, etc. for xAI Grok model (#8282)
* Various updates

Signed-off-by: Zhaohan Dong <65422392+zhaohan-dong@users.noreply.github.com>

* Update xAI branding

Signed-off-by: Zhaohan Dong <65422392+zhaohan-dong@users.noreply.github.com>

* Revert changes

Signed-off-by: Zhaohan Dong <65422392+zhaohan-dong@users.noreply.github.com>

---------

Signed-off-by: Zhaohan Dong <65422392+zhaohan-dong@users.noreply.github.com>
2025-02-05 17:21:47 -08:00
waterstark fbe3c58372 Added a guide for users who want to use LiteLLM with AI/ML API. (#7058)
* Added a guide for users who want to use LiteLLM with AI/ML.

* Minor changes

* Minor changes

* Fix sidebars.js

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
2025-02-05 06:20:35 -08:00
Krish Dholakia 8d3a942fbd Litellm staging (#8270)
* fix(opik.py): cleanup

* docs(opik_integration.md): cleanup opik integration docs

* fix(redact_messages.py): fix redact messages check header logic

ensures stringified bool value in header is still asserted to true

 allows dynamic message redaction

* feat(redact_messages.py): support `x-litellm-enable-message-redaction` request header

allows dynamic message redaction
v1.60.4 v1.60.2-dev1
2025-02-04 22:35:48 -08:00
Krish Dholakia 3c813b3a87 Fix deepseek calling - refactor to use base_llm_http_handler (#8266)
* refactor(deepseek/): move deepseek to base llm http handler

Fixes https://github.com/BerriAI/litellm/issues/8128#issuecomment-2635430457

* fix(gpt_transformation.py): support stream parsing for gpt-like calls

* test(test_deepseek_completion.py): add async streaming test

* fix(gpt_transformation.py): fix import

* fix(gpt_transformation.py): return full api base and content type
2025-02-04 22:30:00 -08:00
Ishaan Jaff 51b9a02615 run ci/cd again 2025-02-04 22:19:57 -08:00
Ishaan Jaff e3b0fd7061 bump: version 1.60.3 → 1.60.4 2025-02-04 22:03:18 -08:00
Krish Dholakia 4e34fc3bf8 [BETA] Support OIDC role based access to proxy (#8260)
* feat(proxy/_types.py): add new jwt field params

allows users + services to auth into proxy

* feat(handle_jwt.py): allow team role proxy access

allows proxy admin to set allowed team roles

* fix(proxy/_types.py): add 'routes' to role based permissions

allow proxy admin to restrict what routes a team can access easily

* feat(handle_jwt.py): support more flexible role based route access

v2 on role based 'allowed_routes'

* test(test_jwt.py): add unit test for rbac for proxy routes

* feat(handle_jwt.py): ensure cost tracking always works for any jwt request with `enforce_rbac=True`

* docs(token_auth.md): add documentation on controlling model access via OIDC Roles

* test: increase time delay before retrying

* test: handle model overloaded for test
2025-02-04 21:59:39 -08:00
Krrish Dholakia 7f06b88192 fix(internal_user_endpoints.py): fix try-except for team not in db 2025-02-04 21:57:43 -08:00