mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 22:48:35 +00:00
3390fc1972
The pass_through prompt-caching tests
(test_prompt_caching_returns_cache_read_tokens_on_second_call,
test_prompt_caching_streaming_second_call_returns_cache_read) make a
warm-up call and then assert the *second* call sees a non-zero
cache_read_input_tokens count from the upstream's prompt-cache. VCR
replay can't model cross-call provider state — both calls match the
same cassette episode, so the second call returns the first call's
pre-warmup response and the assertion fails:
AssertionError: Expected cache_read_input_tokens > 0 on second call,
but got 0. Full usage: {'input_tokens': 4986,
'cache_creation_input_tokens': 4974, 'cache_read_input_tokens': 0}
This started biting after the AWS SigV4 fingerprint stabilization
(b637d9f64a): Bedrock requests now produce a stable per-access-key
fingerprint instead of a per-request signature, so cassettes
successfully replay where they previously always missed and re-recorded
live. Opt these tests out via skip_nodeid_suffixes so they run live and
match the existing pattern in tests/llm_translation/conftest.py
(::test_prompt_caching).
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>