Commit Graph

8 Commits

Author SHA1 Message Date
Alexsander Hamir 415a8ab9a6 Fix: remove merge markdown (#17586) 2025-12-06 05:38:16 -08:00
Ishaan Jaffer 8b499adba6 Revert "Add license metadata to health/readiness endpoint. (#15997)"
This reverts commit d89990e0c5.
2025-12-05 19:31:30 -08:00
Ishaan Jaffer 27a98de600 test_health_and_chat_completion 2025-10-31 19:28:59 -07:00
Ishaan Jaffer cd9cf2e6bd test fix 2025-10-31 19:23:08 -07:00
Ishaan Jaffer 2cd57540a4 fix test fixes 2025-10-31 18:31:00 -07:00
Javier de la Torre e6a7cae7e1 fix(apscheduler): prevent memory leaks from jitter and frequent job intervals (#15846)
* fix(apscheduler): prevent memory leaks from jitter and frequent job intervals

Fixes critical memory leak in APScheduler that causes 35GB+ memory allocations
during proxy startup and operation. The leak was identified through Memray
analysis showing massive allocations in normalize() and _apply_jitter()
functions.

Key changes:
1. Remove jitter parameters from all scheduled jobs - jitter was causing
   expensive normalize() calculations leading to memory explosion
2. Configure AsyncIOScheduler with optimized job_defaults:
   - misfire_grace_time: 3600s (increased from 120s) to prevent backlog
     calculations that trigger memory leaks
   - coalesce: true to collapse missed runs
   - max_instances: 1 to prevent concurrent job execution
   - replace_existing: true to avoid duplicate jobs on restart
3. Increase minimum job intervals:
   - PROXY_BATCH_WRITE_AT: 30s (was 10s)
   - add_deployment/get_credentials jobs: 30s (was 10s)
4. Use fixed intervals with small random offsets instead of jitter for
   job distribution across workers
5. Explicitly configure jobstores and executors to minimize overhead
6. Disable timezone awareness to reduce computation

Memory impact:
- Before: 35GB with 483M allocations during startup
- After: <1GB with normal allocation patterns

Performance notes:
- Minimum job intervals increased from 10s to 30s (configurable via env vars)
- Jobs can still be distributed across workers using random start offsets
- No functional changes to job behavior, only timing and memory optimization

Testing:
- Added comprehensive test suite for scheduler configuration
- Verified no job execution backlog on startup
- Tested duplicate job prevention with replace_existing

Related issue: Memory leak in production proxy servers with APScheduler

\ud83e\udd16 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: update PROXY_BATCH_WRITE_AT default value from 10s to 30s

Update documentation to reflect the new default value for PROXY_BATCH_WRITE_AT
changed in PR #15846. The default was increased from 10 seconds to 30 seconds
to prevent memory leaks in APScheduler.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: Move APScheduler config to constants.py

Address code review feedback from ishaan-jaff:
- Move scheduler configuration variables (coalesce, misfire_grace_time,
  max_instances, replace_existing) to litellm/constants.py
- Update all references in proxy_server.py to use the constants
- Improves maintainability and makes configuration values centralized

Requested-by: @ishaan-jaff
Related: #15846

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 19:30:17 -07:00
Andrew Bernat d89990e0c5 Add license metadata to health/readiness endpoint. (#15997)
* health: expose license metadata (available & expiration) in /health/readiness endpoint

* test: add health readiness license metadata coverage

* test: ensure /health/readiness response includes license metadata

* chore: remove standalone license metadata test as requested; existing test covers codepath

---------

Co-authored-by: Plan42.ai <robot@plan42.ai>
2025-10-28 19:21:54 -07:00
Ishaan Jaff ddfe687b13 (fix) don't block proxy startup if license check fails & using prometheus (#6839)
* fix - don't block proxy startup if not a premium user

* test_litellm_proxy_server_config_with_prometheus

* add test for proxy startup

* fix remove unused test

* fix startup test

* add comment on bad-license
2024-11-20 17:55:39 -08:00