* add AWS fields for KeyManagementSettings
* docs IAM roles
* use aws iam auth on secret manager v2
* fix: load_aws_secret_manager
* test_secret_manager_with_iam_role_settings
Cost tracking was failing for Responses API when using custom deployment names
with base_model configuration. The issue occurred because:
- Chat Completions API stores model_info in 'metadata'
- Responses API stores model_info in 'litellm_metadata'
- Cost calculator only checked 'metadata', missing Responses API costs
Changes:
- Updated _get_base_model_from_metadata() to check both metadata locations
- Added comprehensive unit tests covering all scenarios
- Maintains backward compatibility (metadata takes precedence)
Fixes#16772
- Add HCP_VAULT_MOUNT_NAME env var to override default 'secret' mount
- Add HCP_VAULT_PATH_PREFIX env var to add prefix to secret paths
- Update get_url() method to construct URLs with configurable mount and prefix
- Add test coverage for custom mount names and path prefixes
- Maintain backward compatibility with existing configurations
This allows users to configure Vault paths like:
- Custom mount: {VAULT_ADDR}/v1/{MOUNT_NAME}/data/{SECRET}
- With prefix: {VAULT_ADDR}/v1/secret/data/{PREFIX}/{SECRET}
- Both: {VAULT_ADDR}/v1/{MOUNT_NAME}/data/{PREFIX}/{SECRET}
Resolves issue where mount name was hardcoded and path prefixes weren't supported.
* KeyManagementSystem add cyberark
* add CyberArkSecretManager
* add CyberArkSecretManager
* add CyberArkSecretManager
* docs add CyberArkSecretManager
* docs
* refactor to use get_secret_from_manager
* Potential fix for code scanning alert no. 3645: Clear-text logging of sensitive information
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Potential fix for code scanning alert no. 3650: Clear-text logging of sensitive information
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Potential fix for code scanning alert no. 3649: Clear-text logging of sensitive information
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Potential fix for code scanning alert no. 3646: Clear-text logging of sensitive information
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* fix: Preserve Bedrock inference profile IDs in health checks
- Fixes issue where health checks were stripping inference profile IDs
- Preserves cross-region inference profile prefixes (us., eu., apac., jp., au., us-gov., global.)
- Strips only AWS region routing while preserving routes and handlers
- Resolves both issue #15807 and inference profile requirement errors
- Adds comprehensive tests for all Bedrock model format combinations
Issue #15807 attempted to fix regional Bedrock model health checks but was too
aggressive, stripping cross-region inference profile prefixes that AWS requires.
This caused errors: "Invocation of model ID X with on-demand throughput isn't
supported. Retry your request with the ID or ARN of an inference profile."
The fix now correctly:
- Strips AWS regions (us-west-2, eu-central-1, etc.) from routing
- Preserves CRIS prefixes (us., eu., etc.) required by AWS
- Preserves routes (converse/, invoke/)
- Preserves handlers (llama/, deepseek_r1/)
- Only affects Bedrock models (checked via startswith)
Test coverage includes 20+ scenarios for all Bedrock model format combinations.
* Remove unused traceback import
* fix: use fastuuid helper across the codebase
First batch of changes, simple drop in replacement.
* second batch of changes
* fixed: script mistake on helper file
* add: generic object pool & tests
Introduced a reusable object pool that can be applied across the codebase.
Note: memory growth is managed via eviction settings—using a hard cap could
reduce performance, so eviction is the preferred safeguard.
* fix: simpler tests
- Test multiple tool calls without explicit indices receive sequential indices
- Verify Delta class assigns indices 0, 1, 2... instead of defaulting all to 0
- Add comprehensive assertions for tool call details preservation
- Cover provider-agnostic streaming response scenarios
* Check content and order of trimmed messages
* Assert tool calls are preserved if below max_tokens
* Unreverse order of tool calls
* Return tool calls alongside other messages
* Write test for trimming untokenizable field
* Return original messages in case of exception
* fix(proxy_setting_endpoints.py): require store model in db is enabled for setting user default settings
* test(test_proxy_server.py): update test
* fix(reset_budget_job.py): initial commit adding reset budget logic for team members
* test: update unit testing
* test(test_proxy_budget_reset.py): validate team member budget was reset
* test(test_reset_budget_job.py): update unit tests
* test: update tests
* fix: support Cursor IDE tool_choice format {"type": "auto"}
- Update validate_chat_completion_tool_choice to normalize {"type": "auto"} to "auto"
- Handles Cursor IDE sending non-standard tool_choice format
- Add comprehensive tests for tool choice validation
Fixes#12098
* fix: return full tool_choice object for Cursor IDE format
Based on PR feedback, updated validate_chat_completion_tool_choice to return
the full tool_choice dictionary instead of just extracting the type string.
This maintains consistency with downstream code that expects the full object
structure.
- Changed behavior: {"type": "auto"} now returns {"type": "auto"} instead of "auto"
- Updated tests to reflect the new expected behavior
- Ensures compatibility with code that passes tool_choice to optional_params
Addresses feedback from PR #12168