Commit Graph

88 Commits

Author SHA1 Message Date
Ishaan Jaff 57544f1662 [Feat] Adds IAM role assumption support for AWS Secret Manager (#16887)
* add AWS fields for KeyManagementSettings

* docs IAM roles

* use aws iam auth on secret manager v2

* fix: load_aws_secret_manager

* test_secret_manager_with_iam_role_settings
2025-11-20 12:38:48 -08:00
Cesar Garcia 5e70c78b94 fix(cost-tracking): support base_model lookup in litellm_metadata for Responses API (#16778)
Cost tracking was failing for Responses API when using custom deployment names
with base_model configuration. The issue occurred because:

- Chat Completions API stores model_info in 'metadata'
- Responses API stores model_info in 'litellm_metadata'
- Cost calculator only checked 'metadata', missing Responses API costs

Changes:
- Updated _get_base_model_from_metadata() to check both metadata locations
- Added comprehensive unit tests covering all scenarios
- Maintains backward compatibility (metadata takes precedence)

Fixes #16772
2025-11-18 19:53:18 -08:00
Ishaan Jaffer 95b1608970 test_get_valid_models_with_custom_llm_provider 2025-11-15 09:43:10 -08:00
Krish Dholakia 06906534b3 feat(audio_transcriptions/): calculate duration of audio file for cost calculation + feat (image_generations): cost tracking accuracy improved with output_format, quality, size values fixed per openai model
* feat(audio_transcriptions/): calculate duration of audio file for cost calculation

Fixes https://github.com/BerriAI/litellm/issues/11846

Closes https://github.com/BerriAI/litellm/issues/14605

* fix(cost_calculator.py): correctly use base model, when set

Fixes issue where azure base model was being ignored

* feat(cost_calculator.py): fix default cost tracking quality param for image generation

* feat(image_generations/): return output_format, quality, size

aligns response to openai spec and improves cost tracking accuracy

* fix(cost_calculator.py): refactor cost calculation for image generation to use image response instead of hidden params

* build: update build

* fix: fix cost calculation

* build: update poetry lock

* fix: fix ruff checks

* fix: fix aembedding

* fix: fix ruff errors

* fix: modify to catch errors

* fix: test

* fix: loosen test to handle openai lib out of sync

* fix: fix base models

* fix: fix usage object
2025-11-08 16:24:31 -08:00
Sameer Kankute faae0ff0dc Fix Azure DALL-E-3 health check content policy violation by using safe default prompt (#16329)
* Add custom health check prompt support

* Add constant for health check prompt

* Add constant for health check prompt
2025-11-07 15:30:56 -08:00
Ishaan Jaff a6b0993405 [Feat] Secret Manager - Hashicorp, add auth via approle (#16374)
* add _verify_required_credentials_exist and _auth_via_approle

* test_hashicorp_secret_manager_approle_auth

* docs hcorp auth
2025-11-07 14:39:33 -08:00
Ishaan Jaffer 89a73b853a fix cyber ark 2025-11-06 16:26:14 -08:00
Alexsander Hamir 8ee9b1bc93 feat: Add configurable mount name and path prefix for HashiCorp Vault (#16253)
- Add HCP_VAULT_MOUNT_NAME env var to override default 'secret' mount
- Add HCP_VAULT_PATH_PREFIX env var to add prefix to secret paths
- Update get_url() method to construct URLs with configurable mount and prefix
- Add test coverage for custom mount names and path prefixes
- Maintain backward compatibility with existing configurations

This allows users to configure Vault paths like:
- Custom mount: {VAULT_ADDR}/v1/{MOUNT_NAME}/data/{SECRET}
- With prefix: {VAULT_ADDR}/v1/secret/data/{PREFIX}/{SECRET}
- Both: {VAULT_ADDR}/v1/{MOUNT_NAME}/data/{PREFIX}/{SECRET}

Resolves issue where mount name was hardcoded and path prefixes weren't supported.
2025-11-05 16:06:07 -08:00
Ishaan Jaff 466e7d178c [Feat] Cyber Ark - Add Key Rotations support (#16289)
* KeyManagementSystem add cyberark

* add CyberArkSecretManager

* add CyberArkSecretManager

* add CyberArkSecretManager

* docs add CyberArkSecretManager

* docs

* refactor to use get_secret_from_manager

* fix async roate for cyber ark, re-use base class

* fixes

* cyber ark

* docs fix

* docs fix

* docs cyberark

* fix linting

* fix get_secret_from_manager
2025-11-05 14:03:43 -08:00
Ishaan Jaff 9a372bfad6 [Feat] Add CyberArk Secrets Manager Integration (#16278)
* KeyManagementSystem add cyberark

* add CyberArkSecretManager

* add CyberArkSecretManager

* add CyberArkSecretManager

* docs add CyberArkSecretManager

* docs

* refactor to use get_secret_from_manager

* Potential fix for code scanning alert no. 3645: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 3650: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 3649: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 3646: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-11-05 14:00:45 -08:00
Deepanshu Lulla 812ea03d28 Add tags and descriptions support to aws secrets manager (#16224)
* Add tags and descriptions support to aws secrets manager

* add tags

---------

Co-authored-by: deepanshu <deepanshu.lulla@hq.bill.com>
2025-11-04 16:11:51 -08:00
Ishaan Jaffer 94c2c28f3d claude-sonnet-4-5-20250929 fix 2025-10-31 18:20:52 -07:00
Chris Gibbons 2bef7c3662 fix: Preserve Bedrock inference profile IDs in health checks (#15947)
* fix: Preserve Bedrock inference profile IDs in health checks

- Fixes issue where health checks were stripping inference profile IDs
- Preserves cross-region inference profile prefixes (us., eu., apac., jp., au., us-gov., global.)
- Strips only AWS region routing while preserving routes and handlers
- Resolves both issue #15807 and inference profile requirement errors
- Adds comprehensive tests for all Bedrock model format combinations

Issue #15807 attempted to fix regional Bedrock model health checks but was too
aggressive, stripping cross-region inference profile prefixes that AWS requires.
This caused errors: "Invocation of model ID X with on-demand throughput isn't
supported. Retry your request with the ID or ARN of an inference profile."

The fix now correctly:
- Strips AWS regions (us-west-2, eu-central-1, etc.) from routing
- Preserves CRIS prefixes (us., eu., etc.) required by AWS
- Preserves routes (converse/, invoke/)
- Preserves handlers (llama/, deepseek_r1/)
- Only affects Bedrock models (checked via startswith)

Test coverage includes 20+ scenarios for all Bedrock model format combinations.

* Remove unused traceback import
2025-10-27 19:44:45 -07:00
Ishaan Jaffer d8e5938f54 test_azure_img_gen_health_check 2025-10-25 10:35:47 -07:00
Ishaan Jaffer 778e10119c test_azure_img_gen_health_check 2025-10-25 10:27:10 -07:00
Ishaan Jaffer 0bedf1c0a7 fix tests 2025-10-25 10:19:24 -07:00
Ishaan Jaffer ae7b13550e test_models_by_provider 2025-10-23 09:10:41 -07:00
nuernber 799a2b624a use proper bedrock model name in health check (#15808) 2025-10-22 15:24:57 -07:00
Ishaan Jaff 9135e748a0 [Feat ] /ocr - Add mode + Health check support for OCR models (#15767)
* get_mode_handlers

* use get_mode_handlers

* test_ahealth_check_ocr

* Add OCR mode to test models

* docs OCR Health Checks

* fix connection endpoint
2025-10-21 16:58:37 -07:00
Ishaan Jaffer 8cb66168bc test fix 2025-10-10 19:57:17 -07:00
Georg Wölflein dbfa8ec921 Fix end user cost tracking in the responses API (#15124)
#13860
2025-10-02 15:13:57 -07:00
Ishaan Jaff f6d7683261 [Feat] LiteLLM Overhead metric tracking - Add support for tracking litellm overhead on cache hits (#15045)
* test_litellm_overhead

* vertex track overhead

* fix config.yaml used for testing

* test_litellm_overhead_stream

* add update_response_metadata for caching handler

* add CachingDetails

* fix update_response_metadata import

* add CachingDetails metrics

* add CachingDetails

* test_litellm_overhead_cache_hit

* test_litellm_overhead_cache_hit

* test_litellm_overhead_cache_hit
2025-09-29 17:33:27 -07:00
Ishaan Jaffer e0172b86e2 test_litellm_overhead_non_streaming 2025-09-29 15:48:32 -07:00
Ishaan Jaff 619577d4e8 [Feat] Add litellm overhead metric for VertexAI (#15040)
* test_litellm_overhead

* vertex track overhead

* fix config.yaml used for testing

* test_litellm_overhead_stream

* add update_response_metadata for caching handler

* Revert "add update_response_metadata for caching handler"

This reverts commit f2a891f2b448b878a5dbf4b5b0a6166c807b3705.
2025-09-29 15:15:25 -07:00
Ishaan Jaffer bbf5761b49 tets health check 2025-09-27 12:06:26 -07:00
Alexsander Hamir eaa04cd8ce fix: use fastuuid helper (#14903)
* fix: use fastuuid helper across the codebase

First batch of changes, simple drop in replacement.

* second batch of changes

* fixed: script mistake on helper file
2025-09-25 15:47:01 -07:00
Ishaan Jaffer ba1cd3f0d2 Revert "feature: generic object pool (#14702)"
This reverts commit 60800698f2.
2025-09-24 21:36:49 -07:00
Ishaan Jaff d9bf6a8c53 Revert "Fix: make pondpond as optional dependency for proxy extras, disab…" (#14880)
This reverts commit e75d8b711e.
2025-09-24 21:34:42 -07:00
Alex Shoop e75d8b711e Fix: make pondpond as optional dependency for proxy extras, disable object pooling gracefully (#14863)
* pondpond optional dep proxy extra

* lock
2025-09-24 17:09:30 -07:00
Krish Dholakia d4540d31c1 Merge branch 'main' into fix/streaming-tool-call-indices 2025-09-21 21:24:22 -07:00
Alexsander Hamir 60800698f2 feature: generic object pool (#14702)
* add: generic object pool & tests

Introduced a reusable object pool that can be applied across the codebase.
Note: memory growth is managed via eviction settings—using a hard cap could
reduce performance, so eviction is the preferred safeguard.

* fix: simpler tests
2025-09-18 18:32:45 -07:00
Ishaan Jaffer c6afa904bb fix: test_completion_with_no_model 2025-09-18 10:17:09 -07:00
Ishaan Jaffer 1e1d174733 fix: test_completion_with_no_model 2025-09-18 10:13:32 -07:00
Tim Elfrink c5ca2afec3 Add test for tool call sequential index assignment
- Test multiple tool calls without explicit indices receive sequential indices
- Verify Delta class assigns indices 0, 1, 2... instead of defaulting all to 0
- Add comprehensive assertions for tool call details preservation
- Cover provider-agnostic streaming response scenarios
2025-09-15 21:11:13 +02:00
Krrish Dholakia d05f58721e test: remove end of life model from tests 2025-09-09 21:01:45 -07:00
Ishaan Jaff d37be48a80 test: llama-3.3-70b-versatile 2025-09-01 20:14:12 -07:00
Krish Dholakia 3e764ec268 Merge pull request #13808 from mainred/validate_api_version
feat(utils.py): accept 'api_version' as param for validate_environment
2025-08-22 23:59:38 -07:00
Ishaan Jaff e93e266f84 [Performance] Use O(1) Set lookups for model routing (#13879)
* o(1) lookups

* Revert "o(1) lookups"

This reverts commit 620d14246980813366b4b1f1c0ce396b528dd9df.

* o(1) lookups

* Revert "o(1) lookups"

This reverts commit 676a9f5bcc3c2b9fa31e0a9fdf00389739b3052f.

* o(1) lookups

* register_model fix

* test_aget_valid_models

* lambda ai models fix

* test_utils.py

* test fix vertex ai
2025-08-21 22:56:46 -07:00
Qingchuan Hao f2a6be390b feat(utils.py): accept 'api_version' as param for validate_environment 2025-08-20 14:29:58 +00:00
Krrish Dholakia f544a4e238 test: update test 2025-07-29 21:08:36 -07:00
Robert Gambee 52b2984792 [Bug Fix] Always include tool calls in output of trim_messages (#11517)
* Check content and order of trimmed messages

* Assert tool calls are preserved if below max_tokens

* Unreverse order of tool calls

* Return tool calls alongside other messages

* Write test for trimming untokenizable field

* Return original messages in case of exception
2025-07-17 16:01:59 -07:00
Krish Dholakia d202ce229b Prevent writing default user setting updates to yaml (error in non-root env) + Use central team member budget when max_budget_in_team set on UI (#12533)
* fix(proxy_setting_endpoints.py): require store model in db is enabled for setting user default settings

* test(test_proxy_server.py): update test

* fix(reset_budget_job.py): initial commit adding reset budget logic for team members

* test: update unit testing

* test(test_proxy_budget_reset.py): validate team member budget was reset

* test(test_reset_budget_job.py): update unit tests

* test: update tests
2025-07-12 10:13:07 -07:00
Jugal D. Bhatt aa14d26da4 fix slack alerts (#12464)
* fix slack alerts

* remvoe print

* add unit test
2025-07-10 08:58:47 -07:00
Ishaan Jaff c31a7d3ab7 fix new utils tests 2025-07-04 18:30:50 -07:00
Cole McIntosh a7594196cd fix: support Cursor IDE tool_choice format {"type": "auto"} (#12168)
* fix: support Cursor IDE tool_choice format {"type": "auto"}

- Update validate_chat_completion_tool_choice to normalize {"type": "auto"} to "auto"
- Handles Cursor IDE sending non-standard tool_choice format
- Add comprehensive tests for tool choice validation

Fixes #12098

* fix: return full tool_choice object for Cursor IDE format

Based on PR feedback, updated validate_chat_completion_tool_choice to return
the full tool_choice dictionary instead of just extracting the type string.
This maintains consistency with downstream code that expects the full object
structure.

- Changed behavior: {"type": "auto"} now returns {"type": "auto"} instead of "auto"
- Updated tests to reflect the new expected behavior
- Ensures compatibility with code that passes tool_choice to optional_params

Addresses feedback from PR #12168
2025-06-30 12:39:58 -07:00
Ishaan Jaff 0c19414b36 [️ Python SDK import] - reduce python sdk import time by .3s (#12140)
* use 1 file for KeyManagementSystem

* move key management settings

* fix import locs

* test_proxy_types_not_imported

* test the import loc

* fix import item

* fix imports

* fix import loc

* fix imports
2025-06-28 14:57:10 -07:00
Ishaan Jaff 6b623f9c98 test whitelisted models 2025-06-28 14:46:16 -07:00
Bougou Nisou 58dda44fda feat: enhance redaction functionality for EmbeddingResponse (#12088) 2025-06-27 21:30:26 -07:00
Laurien 0c50f8bcc9 Update enduser spend and budget reset date based on budget duration (#8460) 2025-06-08 08:39:14 -07:00
Krrish Dholakia e5f228abd5 fix(utils.py): handle litellm proxy case for checking model info 2025-06-06 09:24:41 -07:00