* feat(generic_api_callback.py): make generic api OSS + support multiple generic API's
Enables https://github.com/BerriAI/litellm/pull/17094#discussion_r2562832967
* feat(callback_utils.py): support custom generic api callbacks
* feat(generic_api_callback.py): support specifying which event types to run the generic api for
* fix(litellm_logging.py): log system prompt for anthropic messages
* feat(generic_api_callback.py): support generic api compatible api's - e.g. rubrik agent cloud
* docs(sidebars.js): document new OSS generic api
* docs(generic_api.md): document new OSS Generic API
* docs(custom_webhook_api.md): document custom webhook api integration tutorial
* docs(custom_webhook_api.md): cleanup
* docs(custom_webhook_api.md): document what get's logged to custom webhook api
* Refactor: Pass callback config to GenericAPILogger
Co-authored-by: krrishdholakia <krrishdholakia@gmail.com>
* Fix: Handle empty messages list in logging payload
Co-authored-by: krrishdholakia <krrishdholakia@gmail.com>
* Checkpoint before follow-up message
Co-authored-by: krrishdholakia <krrishdholakia@gmail.com>
* feat: Cache GenericAPILogger instances to improve performance
Co-authored-by: krrishdholakia <krrishdholakia@gmail.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
* Add v1 cut of container api
* fix lint errors
* Add proxy support to container apis & logging support (#16049)
* Add proxy support to container apis
* Add logging support
* Add cost tracking support for containers and documentation
* Add new constant documentation
* Add container cost in model map
* fix failing azure tests
* Update tests based on model map changes
* fix model map tests
* fix model map tests
* Container modeshould be container
* Container tests fix
* Merge branch 'main' into litellm_sameer_oct_staging_2
* Add Prometheus metric to track callback logging failures in S3 (#16102)
* Add proxy support to container apis
* Add logging support
* prometheus metric measures how often s3_v2 is failing
* remove not needed files
* remove not needed files
* remove not needed files
* fix mypy errors
* Use logging_callback_manager to get all the callbacks
---------
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
* fix(apscheduler): prevent memory leaks from jitter and frequent job intervals
Fixes critical memory leak in APScheduler that causes 35GB+ memory allocations
during proxy startup and operation. The leak was identified through Memray
analysis showing massive allocations in normalize() and _apply_jitter()
functions.
Key changes:
1. Remove jitter parameters from all scheduled jobs - jitter was causing
expensive normalize() calculations leading to memory explosion
2. Configure AsyncIOScheduler with optimized job_defaults:
- misfire_grace_time: 3600s (increased from 120s) to prevent backlog
calculations that trigger memory leaks
- coalesce: true to collapse missed runs
- max_instances: 1 to prevent concurrent job execution
- replace_existing: true to avoid duplicate jobs on restart
3. Increase minimum job intervals:
- PROXY_BATCH_WRITE_AT: 30s (was 10s)
- add_deployment/get_credentials jobs: 30s (was 10s)
4. Use fixed intervals with small random offsets instead of jitter for
job distribution across workers
5. Explicitly configure jobstores and executors to minimize overhead
6. Disable timezone awareness to reduce computation
Memory impact:
- Before: 35GB with 483M allocations during startup
- After: <1GB with normal allocation patterns
Performance notes:
- Minimum job intervals increased from 10s to 30s (configurable via env vars)
- Jobs can still be distributed across workers using random start offsets
- No functional changes to job behavior, only timing and memory optimization
Testing:
- Added comprehensive test suite for scheduler configuration
- Verified no job execution backlog on startup
- Tested duplicate job prevention with replace_existing
Related issue: Memory leak in production proxy servers with APScheduler
\ud83e\udd16 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* docs: update PROXY_BATCH_WRITE_AT default value from 10s to 30s
Update documentation to reflect the new default value for PROXY_BATCH_WRITE_AT
changed in PR #15846. The default was increased from 10 seconds to 30 seconds
to prevent memory leaks in APScheduler.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* refactor: Move APScheduler config to constants.py
Address code review feedback from ishaan-jaff:
- Move scheduler configuration variables (coalesce, misfire_grace_time,
max_instances, replace_existing) to litellm/constants.py
- Update all references in proxy_server.py to use the constants
- Improves maintainability and makes configuration values centralized
Requested-by: @ishaan-jaff
Related: #15846🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* fix(support-model-specific-tpm/rpm-limits): Allows setting rate limits by tpm/rpm for models by team
* fix(key_management_endpoints.py): enforce guaranteed throughput with key-level model tpm/rpm limits, when team-level tpm/rpm limits are set
* test: add unit testing
* feat(schema.prisma): add metadata to litellm budget table
* feat(proxy/utils.py): add org limits to user api key auth
allows org level tpm/rpm limiting to work
* feat: add org level tpm/rpm limits + inherit org id in key from team
enables org level tpm/rpm limits
* feat: validated working org tpm/rpm limits
* feat: support updating org level, model specific tpm/rpm limits
* fix: working key validation for org level tpm/rpm limits
* fix: working validation for orgs when giving tpm/rpm to teams
* fix(key_management_endpoints.py): fix tpm/rpm limits on orgs
* fix(key_management_endpoints.py): support limits
* refactor: remove duplicate var
* fix: refactor to avoid ruff errors
* fix: fix typign
* fix: fix linting error
* fix: fix testing
* fix(key_management_endpoints.py): document params
* fix(managed_files.py): don't raise error if managed object is not found
* feat(vector_stores): add azure ai search vector store support
Enables direct querying a vector store on azure
* fix(azure/vector_stores): working azure ai search api vector stores
allows azure direct querying on vector stores
* test: update env vars
* docs(docs/): document new azure ai vector store search
* docs(azure_ai_vector_stores.md): add table
* docs: clarify support for 'create' vector stores
* fix(vector_stores/endpoints.py): Fixes https://github.com/BerriAI/litellm/issues/14606
* fix: fix linting errors
* Addd v2/chat support for cohere
* fix streaming
* Use v2_transformation for logging passthrough:
* Use v2_transformation for logging passthrough:
* Add test for checking if document and citation_options is getting passed
* Update the cohere model
* Add cost tracking for vertex ai passthrough batch jobs
* Add full passthrough support
* refactor code according to the comments
* Add passthrough handler
* remove invalid params
* Updated documentation
* Updated documentation
* Updated documentation
* Correct the import
* Add openai videos generation and retrieval support
* add retrieval endpoint
* Add docs
* Add imports
* remove orjson
* remove double import
* fix openai videos format
* remove mock code
* remove not required comments
* Add tests
* Add tests
* Add other video endpoints
* Fix cost calculation and transformation
* Fixed mypy tests
* remove not used imports
* fix documentation for get batch req (#15742)
* Add grounding info to responses API (#15737)
* Add grounding info to responses API
* fix lint errors
* Use typed objects for annotations
* Use typed objects for annotations
* fix mypy error
* Litellm fix json serialize alreting 2 (#15741)
* fix json serializable error for alerts
* Add test
* fix mypt errors
* fix mypt errors
* Add Qwen3 imported model support for AWS Bedrock (#15783)
* Add qwen imported model support
* fix mypy errors
* fix empty user message error (#15784)
* fix typed dict for list
* Add azure supported videos endpoint
* fix mapped tests
* add azure sora models to model map
* Add OpenAI video generation and content retrieval support (#15745)
* Add openai videos generation and retrieval support
* add retrieval endpoint
* Add docs
* Add imports
* remove orjson
* remove double import
* fix openai videos format
* remove mock code
* remove not required comments
* Add tests
* Add tests
* Add other video endpoints
* Fix cost calculation and transformation
* Fixed mypy tests
* remove not used imports
* fix typed dict for list
* fix mypy errors
* move directory
* make v2 chat default
* Fix mypy tests
* Fix mypy tests
* Fix mypy tests
* Fix mypy tests
* Revert "Add Azure Video Generation Support with Sora Integration"
* refactor videos repo
* add test
* Add azure openai videos support
* Add azure openai videos support
* Add router endpoint support for videos
* fix mypy error
* add azure models
* fix mapped test
* fix mypy error
* Add proxy router test
* Add proxy router test
* remove deprecated model name from tests
* fix import error
* fix import error
* Add gaurdrail integration in videos endpoint
* Add logging support for videos endpoint
* Add final documentation supporting videos integration
* fix model name and document input
* Update literals to avoid mypy errors
* Remove unused imports and print statements
* revert guardrail support for video generation and video remix
* revert guardrail support for video generation and video remix
* Fix failing mapped and llm translation tests
* Implement fix for thinking_blocks and converse API calls
This fixes Claude's models via the Converse API, which should also fix
Claude Code.
* Add thinking literal
* Fix mypy issues
* Type fix for redacted thinking
* Add voyage model integration in sagemaker
* Add config file logic
* Use already exiting voyage transformation
* refactor code as per comments
* fix merge error
* refactor code as per comments
* refactor code as per comments
* UI new build
* [Fix] router - regression when adding/removing models (#15451)
* fix(router): update model_name_to_deployment_indices on deployment removal
When a deployment is deleted, the model_name_to_deployment_indices map
was not being updated, causing stale index references. This could lead
to incorrect routing behavior when deployments with the same model_name
were dynamically removed.
Changes:
- Update _update_deployment_indices_after_removal to maintain
model_name_to_deployment_indices mapping
- Remove deleted indices and decrement indices greater than removed index
- Clean up empty entries when no deployments remain for a model name
- Update test to verify proper index shifting and cleanup behavior
* fix(router): remove redundant index building during initialization
Remove duplicate index building operations that were causing unnecessary
work during router initialization:
1. Removed redundant `_build_model_id_to_deployment_index_map` call in
__init__ - `set_model_list` already builds all indices from scratch
2. Removed redundant `_build_model_name_index` call at end of
`set_model_list` - the index is already built incrementally via
`_create_deployment` -> `_add_model_to_list_and_index_map`
Both indices (model_id_to_deployment_index_map and
model_name_to_deployment_indices) are properly maintained as lookup
indexes through existing helper methods. This change eliminates O(N)
duplicate work during initialization without any behavioral changes.
The indices continue to be correctly synchronized with model_list on
all operations (add/remove/upsert).
* fix(prometheus): Fix Prometheus metric collection in a multi-workers environment (#14929)
Co-authored-by: sotazhang <sotazhang@tencent.com>
* Add tiered pricing and cost calculation for xai
* Use generic cost calculator
* Resolve conflicts in generated HTML files
* Remove penalty params as supported params for gemini preview model (#15503)
* fix conversion of thinking block
* add application level encryption in SQS (#15512)
* docs: fix doc
* docs(index.md): bump rc
* [Fix] GEMINI - CLI - add google_routes to llm_api_routes (#15500)
* fix: add google_routes to llm_api_routes
* test: test_virtual_key_llm_api_routes_allows_google_routes
* build: bump version
* bump: version 1.78.0 → 1.78.1
* add application level encryption in SQS
* add application level encryption in SQS
---------
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: deepanshu <deepanshu.lulla@hq.bill.com>
* [Feat] Bedrock Knowledgebase - return search_response when using /chat/completions API with LiteLLM (#15509)
* docs: fix doc
* docs(index.md): bump rc
* [Fix] GEMINI - CLI - add google_routes to llm_api_routes (#15500)
* fix: add google_routes to llm_api_routes
* test: test_virtual_key_llm_api_routes_allows_google_routes
* add AnthropicCitation
* fix async_post_call_success_deployment_hook
* fix add vector_store_custom_logger to global callbacks
* test_e2e_bedrock_knowledgebase_retrieval_with_llm_api_call
* async_post_call_success_deployment_hook
* add async_post_call_streaming_deployment_hook
* async def test_e2e_bedrock_knowledgebase_retrieval_with_llm_api_call_streaming(setup_vector_store_registry):
* fix _call_post_streaming_deployment_hook
* fix async_post_call_streaming_deployment_hook
* test update
* docs: Accessing Search Results
* docs KB
* fix chatUI
* fix searchResults
* fix onSearchResults
* fix kb
---------
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
* [Feat] Add dynamic rate limits on LiteLLM Gateway (#15518)
* docs: fix doc
* docs(index.md): bump rc
* [Fix] GEMINI - CLI - add google_routes to llm_api_routes (#15500)
* fix: add google_routes to llm_api_routes
* test: test_virtual_key_llm_api_routes_allows_google_routes
* build: bump version
* bump: version 1.78.0 → 1.78.1
* fix: KeyRequestBase
* fix rpm_limit_type
* fix dynamic rate limits
* fix use dynamic limits here
* fix _should_enforce_rate_limit
* fix _should_enforce_rate_limit
* fix counter
* test_dynamic_rate_limiting_v3
* use _create_rate_limit_descriptors
---------
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
* Add google rerank endpoint
* Add docs
* fix mypy error
* fix mypy and lint errors
* Add haiku 4.5 integration
* Add haiku 4.5 integration for other regions as well
* Handle citation field correctly
* Fix filtering headers for signature calcs
* Add haiku 4.5 integration (#15650)
---------
Co-authored-by: Leslie Cheng <leslie.cheng5@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com>
Co-authored-by: Lucas <10226902+LoadingZhang@users.noreply.github.com>
Co-authored-by: sotazhang <sotazhang@tencent.com>
Co-authored-by: Deepanshu Lulla <deepanshu.lulla@gmail.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: deepanshu <deepanshu.lulla@hq.bill.com>
* fix: use fastuuid helper across the codebase
First batch of changes, simple drop in replacement.
* second batch of changes
* fixed: script mistake on helper file
* fix: cli auth with SSO okta
* fix: add LITTELM_CLI_SERVICE_ACCOUNT_NAME
* fix: get_litellm_cli_user_api_key_auth
* use existing_key CLI
* fix: use existing key
* test auth commands
* test_cli_sso_callback_regenerate_vs_create_flow
* feat: add CLI Token Utilities
* fix: get_stored_api_key
* move file
* fix: get_valid_models
* fix config.yaml
* TestCLITokenUtils
* TestGetValidModelsWithCLI
* fix: tie user id to keys created through CLI
* fix: add teams interface to CLI
* add /keys/update to the list client commands
* fix /sso/cli/poll to return the user_id
* fix: working TeamsManagementClient
* fix CLI Login command
* fixes for auth
* Potential fix for code scanning alert no. 3400: Clear-text logging of sensitive information
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* ruff fix
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>