hosted_vllm no longer uses the OpenAI client, so these tests
that mock the OpenAI client are not applicable to hosted_vllm.
Removes hosted_vllm from:
- test_openai_compatible_custom_api_base
- test_openai_compatible_custom_api_video
- Filter skip_mcp_handler and other internal params in fallback_utils.py before calling acompletion
Fixes issue where internal parameters were being passed to provider APIs causing errors
- Remove deployment field from GCS bucket logger test metadata
Fixes model name mismatch where deployment field was overriding the model in logging
- Update Bedrock Titan test to use non-deprecated model (titan-text-express-v1)
Fixes test failure due to deprecated amazon.titan-text-lite-v1 model
* fix(main.py): fix async retryer
Fixes https://github.com/BerriAI/litellm/issues/12830
* fix(forward_clientside_headers_by_model_group.py): filter out 'content-type' from forwardable headers
clientside content-type != proxy content type, can cause requests to hang
* test(tests/): update tests
* Add new model provider Novita AI (#7582)
* feat: add new model provider Novita AI
* feat: use deepseek r1 model for examples in Novita AI docs
* fix: fix tests
* fix: fix tests for novita
* fix: fix novita transformation
* ci: fix ci yaml
* fix: fix novita transformation and test (#10056)
---------
Co-authored-by: Jason <ggbbddjm@gmail.com>
* Update docs for OpenAI compatible providers, add Llamafile docs, include Llamafile in the sidebar
* Add Llamafile as an LlmProviders enum
* Add llamafile as a OpenAI compatible provider (in the list of compatible providers)
* Add Llamafile chat config and tests
* Wire up Llamafile
Co-authored-by: Peter Wilson <peter@mozilla.ai>
* build(model_prices_and_context_window.json): add fireworks ai new 0-4b pricing tier
* build(model_prices_and_context_window.json): add more fireworks ai models
* test: update testing
* fix(caching_handler.py): handle str + list cache
Fixes issue on cache hits for embedding when initial cached input was str
* test(test_caching.py): add e2e test on caching with individual item and then list
* fix(caching_handler.py): set usage tokens for cache hits
enables token counting to work
* fix(caching_handler.py): combine usage between cached result and embedding response
Handles case of new input to embedding response
* fix: cleanup
* test: move to gpt-4o-new-test
* test: update test