* feat(llm_passthrough_endpoints.py): support milvus passthrough api
* fix(llm_passthrough_endpoints.py): move streaming request value to the top of the function
* docs: document new milvus vector store passthrough flow
* feat: change guardrail_information to list type to support displaying multiple guardrails
* fix: add missing commit and revert auto-format changes in utils.py
---------
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* Add v1 cut of container api
* fix lint errors
* Add proxy support to container apis & logging support (#16049)
* Add proxy support to container apis
* Add logging support
* Add cost tracking support for containers and documentation
* Add new constant documentation
* Add container cost in model map
* fix failing azure tests
* Update tests based on model map changes
* fix model map tests
* fix model map tests
* Container modeshould be container
* Container tests fix
* Merge branch 'main' into litellm_sameer_oct_staging_2
---------
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
* fix model error for apis which don't need model
* fix print statments:
* fix mypy lint errors
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
* feat(vector_store_endpoints/endpoints.py): add new index_create endpoint
allows admin to create a virtual index, to do permission management for
* feat(key_management_endpoints.py): enable setting allowed_vector_store_indexes on keys
proxy admin can enable dev to create an index on a vector stor
* feat: initial commit adding vector store index passthrough logic to litellm
* feat: add vector store table
* fix(azure_ai/transformation.py): fix headers
* feat: track read/write endpoints by vector store integration
enables permissions by index to work
* fix: azure_ai/vector_stores/search
document the vector store endpoints correctly
ensures permission management works as expected
* fix(proxy/utils.py): improve error message
* docs(azure_ai_vector_stores_passthrough.md): document azure ai passthrough vector store support
* docs(create.md): document azure ai support via passthrough for vector store create
* fix: fix code qa errors
* fix: document new allowed_vector_store_indexes endpoint
* feat(milvus/): initial commit adding milvus vector store support to LiteLLM
allows querying milvus vector store through litellm
* feat(bedrock/vector_stores): support translating openai filters param to aws kb
adds filtering to aws kb
* feat(milvus/): add milvus vector store unified search support
allows calling milvus vector store in through chat completions
* docs(milvus_vector_stores.md): document new milvus vector search integration
* feat(pass_through_endpoints.py): support passing form data through to a passthrough endpoint
Closes LIT-1147
* fix: fix linting errors
* fix(apscheduler): prevent memory leaks from jitter and frequent job intervals
Fixes critical memory leak in APScheduler that causes 35GB+ memory allocations
during proxy startup and operation. The leak was identified through Memray
analysis showing massive allocations in normalize() and _apply_jitter()
functions.
Key changes:
1. Remove jitter parameters from all scheduled jobs - jitter was causing
expensive normalize() calculations leading to memory explosion
2. Configure AsyncIOScheduler with optimized job_defaults:
- misfire_grace_time: 3600s (increased from 120s) to prevent backlog
calculations that trigger memory leaks
- coalesce: true to collapse missed runs
- max_instances: 1 to prevent concurrent job execution
- replace_existing: true to avoid duplicate jobs on restart
3. Increase minimum job intervals:
- PROXY_BATCH_WRITE_AT: 30s (was 10s)
- add_deployment/get_credentials jobs: 30s (was 10s)
4. Use fixed intervals with small random offsets instead of jitter for
job distribution across workers
5. Explicitly configure jobstores and executors to minimize overhead
6. Disable timezone awareness to reduce computation
Memory impact:
- Before: 35GB with 483M allocations during startup
- After: <1GB with normal allocation patterns
Performance notes:
- Minimum job intervals increased from 10s to 30s (configurable via env vars)
- Jobs can still be distributed across workers using random start offsets
- No functional changes to job behavior, only timing and memory optimization
Testing:
- Added comprehensive test suite for scheduler configuration
- Verified no job execution backlog on startup
- Tested duplicate job prevention with replace_existing
Related issue: Memory leak in production proxy servers with APScheduler
\ud83e\udd16 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* docs: update PROXY_BATCH_WRITE_AT default value from 10s to 30s
Update documentation to reflect the new default value for PROXY_BATCH_WRITE_AT
changed in PR #15846. The default was increased from 10 seconds to 30 seconds
to prevent memory leaks in APScheduler.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* refactor: Move APScheduler config to constants.py
Address code review feedback from ishaan-jaff:
- Move scheduler configuration variables (coalesce, misfire_grace_time,
max_instances, replace_existing) to litellm/constants.py
- Update all references in proxy_server.py to use the constants
- Improves maintainability and makes configuration values centralized
Requested-by: @ishaan-jaff
Related: #15846🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Added a new "Configuration with Environment Variables" section demonstrating:
- Using os.getenv() to dynamically retrieve OpenRouter configuration
- Explicitly passing base_url parameter with environment variables
- Benefits of this approach for managing configs across environments
This helps users implement production-ready configuration patterns.
* 1. add v3 classify
2. add new classifix for masking
3. support same id for the conversation for pre and post
working with duplicates
* clean code, remove some debug and run tests
* update liter errors
* improvment for Code Organization, httpx Error Handling Specificity, Logging Improvements and Type
* transfer test test_lasso_guard_config to the new location
* Fix type hints and linting errors in lasso.py
- Add type: ignore for httpx module when None
- Fix return type issues in _handle_classification and _handle_masking
- Ensure masked_messages is not None before passing to _apply_masking_to_model_response
- Convert LassoResponse to dict for _log_masking_applied call
* feat(lasso): Upgrade to Lasso API v3 and fix ULID generation
- Update Lasso API endpoints from v2 to v3 (/gateway/v3/classify)
- Update masking endpoints from v1 to v3 (/gateway/v3/classifix)
- Fix ULID generation: use ulid.new() instead of ULID() constructor
- Resolve MemoryView error that occurred with incorrect ULID usage
Tested with real proxy server and verified:
- Malicious content (jailbreak) properly blocked
- Safe content passes through guardrail
- PII detection and masking works correctly
- No ULID generation errors
* docs(lasso): Add ulid-py>=1.1.0 dependency prerequisite
Add Prerequisites section documenting the required ulid-py package
(version 1.1.0 or higher) for Lasso guardrail conversation tracking.
* update docs with the right api_key format
* fix(opentelemetry.py): fix issue where headers were not being split correctly
* feat(bedrock/image): Support bedrock titan image generation
Closes https://github.com/BerriAI/litellm/issues/361
* build(model_prices_and_context_window.json): track titan image gen pricing
enables cost tracking per request
* feat(amazon_titan_transformation.py): support titan image generation cost tracking
* docs: document new model
* docs: update docs to indicate cost tracking + refactor rerank into separate doc
* fix: fix mypy linting error
* fix: fix type ignore
* feat(responses_id_security.py): encrypt response.id - prevent user A from retrieving user B's response
additional security for retrievals on shared accounts
Closes LIT-1307
* feat(responses_id_security.py): allow admin to disable responses id security check
* test: add initial unit testing
* feat(responses_id_security.py): add streaming support
* docs: document new param
* docs: document new param
* feat(responses_id_security.py): add team id checks - ensure it works for service accounts
prevent service accounts keys from different teams from accessing each other's responses
more secure
* test: add unit testing
* fix: fix linting error
* fix(presidio.py): handle content as a list of texts
covers openai + anthropic messages api
* fix(presidio.py): safe get messages
* test: add unit testing for presidio guardrails
* fix(unified_guardrail.py): initial commit
* fix(enkryptai.py): implement apply_guardrail to enkrypt guardrail
* fix(unified_guardrail.py): support unified guardrail on input
* feat(unified_guardrail.py): add post call success hook implementation
allows us to just have 1 place to handle llm translation to guardrail api spec
* refactor: refactor initial unified guardrail component
* refactor: more refactoring
* feat(responses/): add guardrails to responses api
allows existing guardrails to work for new llm endpoints
* docs(adding_guardrail_support.md): document new guardrail endpoint support
* test: add unit tests
* feat(image_generation/): add guardrail support for image generation endpoint
* feat(openai/text_completion): support guardrails on `/v1/completions` API
* docs: document guardrails support on new endpoints
* docs: clarify when guardrails run
* feat(openai/speech): add guardrail support for input
* docs(rerank/): add guardrail support on input query
* fix: fix ruff check
* feat(vector_stores/): initial commit adding Vertex AI Search API support for litellm
new vector store provider
* feat(vector_store/): use vector store id for vertex ai search api
* fix: transformation.py
cleanup
* fix: implement abstract function
* fix: fix linting error
* fix: main.py
fix check
* feat: initial commit with working passthrough support for vertex ai search api through litellm
* feat(llm_passthrough_endpoints.py): fix passing correct project on datastore passthrough
* feat(vertex_ai/): support passthrough call for vertex ai search vector store
* docs(vertex_ai_search_datastore.md): document new vertex ai passthrough endpoint
* docs(sidebars.js): document new endpoint
* feat: initial commit adding logging for vertex ai passthrough api
allows vertex ai vector search api to work with cost calculation
* feat(vertex_ai/): search vector store cost tracking
* fix(vertex_passthrough_logging_handler.py): log the cost
* fix: improve logged response
* fix(vertex_passthrough_logging_handler.py): logging
* feat(litellm_logging): main.py
add cost tracking for vertex ai search api via unified api
* refactor: fix ruff checks
* fix(llm_passthrough_endpoints.py): fix linting
* fix(managed_files.py): don't raise error if managed object is not found
* feat(vector_stores): add azure ai search vector store support
Enables direct querying a vector store on azure
* fix(azure/vector_stores): working azure ai search api vector stores
allows azure direct querying on vector stores
* test: update env vars
* docs(docs/): document new azure ai vector store search
* docs(azure_ai_vector_stores.md): add table
* docs: clarify support for 'create' vector stores
* fix(vector_stores/endpoints.py): Fixes https://github.com/BerriAI/litellm/issues/14606
* fix: fix linting errors
* fix(oldteams.tsx): allow org admin to create team on ui
* fix(oldteams.tsx): show org admin a dropdown of allowed orgs for team creation
* docs(access_control.md): cleanup doc
* feat(ibm_guardrails/): initial commit adding support for ibm guardrails on litellm
allows user to use self-hosted ibm guardrails
* feat(ibm_detector.py): working detector
* docs(ibm_guardrails.md): document new ibm guardrails
* fix: fix linting errors
* Addd v2/chat support for cohere
* fix streaming
* Use v2_transformation for logging passthrough:
* Use v2_transformation for logging passthrough:
* Add test for checking if document and citation_options is getting passed
* Update the cohere model
* Add cost tracking for vertex ai passthrough batch jobs
* Add full passthrough support
* refactor code according to the comments
* Add passthrough handler
* remove invalid params
* Updated documentation
* Updated documentation
* Updated documentation
* Correct the import
* Add openai videos generation and retrieval support
* add retrieval endpoint
* Add docs
* Add imports
* remove orjson
* remove double import
* fix openai videos format
* remove mock code
* remove not required comments
* Add tests
* Add tests
* Add other video endpoints
* Fix cost calculation and transformation
* Fixed mypy tests
* remove not used imports
* fix documentation for get batch req (#15742)
* Add grounding info to responses API (#15737)
* Add grounding info to responses API
* fix lint errors
* Use typed objects for annotations
* Use typed objects for annotations
* fix mypy error
* Litellm fix json serialize alreting 2 (#15741)
* fix json serializable error for alerts
* Add test
* fix mypt errors
* fix mypt errors
* Add Qwen3 imported model support for AWS Bedrock (#15783)
* Add qwen imported model support
* fix mypy errors
* fix empty user message error (#15784)
* fix typed dict for list
* Add azure supported videos endpoint
* fix mapped tests
* add azure sora models to model map
* Add OpenAI video generation and content retrieval support (#15745)
* Add openai videos generation and retrieval support
* add retrieval endpoint
* Add docs
* Add imports
* remove orjson
* remove double import
* fix openai videos format
* remove mock code
* remove not required comments
* Add tests
* Add tests
* Add other video endpoints
* Fix cost calculation and transformation
* Fixed mypy tests
* remove not used imports
* fix typed dict for list
* fix mypy errors
* move directory
* make v2 chat default
* Fix mypy tests
* Fix mypy tests
* Fix mypy tests
* Fix mypy tests
* Revert "Add Azure Video Generation Support with Sora Integration"
* refactor videos repo
* add test
* Add azure openai videos support
* Add azure openai videos support
* Add router endpoint support for videos
* fix mypy error
* add azure models
* fix mapped test
* fix mypy error
* Add proxy router test
* Add proxy router test
* remove deprecated model name from tests
* fix import error
* fix import error
* Add gaurdrail integration in videos endpoint
* Add logging support for videos endpoint
* Add final documentation supporting videos integration
* fix model name and document input
* Update literals to avoid mypy errors
* Remove unused imports and print statements
* revert guardrail support for video generation and video remix
* revert guardrail support for video generation and video remix
* Fix failing mapped and llm translation tests