* fix(router.py): add acompletion_streaming_iterator inside router
allows router to catch errors mid-stream for fallbacks
Work for https://github.com/BerriAI/litellm/issues/6532
* fix(router.py): working mid-stream fallbacks
* fix(router.py): more iterations
* fix(router.py): working mid-stream fallbacks with fallbacks set on router
* fix(router.py): pass prior content back in new request as assistant prefix message
* fix(router.py): add a system prompt to help guide non-prefix supporting models to use the continued text correctly
* fix(common_utils.py): support converting `prefix: true` for non-prefix supporting models
* fix: reduce LOC in function
* test(test_router.py): add unit tests for new function
* test: add basic unit test
* fix(router.py): ensure return type of fallback stream is compatible with CustomStreamWrapper
prevent client code from breaking
* fix: cleanup
* test: update test
* fix: fix linting error
* Fix Vertex AI function calling invoke: use JSON format instead of protobuf text format. (#6702)
* test: test tool_call conversion when arguments is empty dict
Fixes https://github.com/BerriAI/litellm/issues/6833
* fix(openai_like/handler.py): return more descriptive error message
Fixes https://github.com/BerriAI/litellm/issues/6812
* test: skip overloaded model
* docs(anthropic.md): update anthropic docs to show how to route to any new model
* feat(groq/): fake stream when 'response_format' param is passed
Groq doesn't support streaming when response_format is set
* feat(groq/): add response_format support for groq
Closes https://github.com/BerriAI/litellm/issues/6845
* fix(o1_handler.py): remove fake streaming for o1
Closes https://github.com/BerriAI/litellm/issues/6801
* build(model_prices_and_context_window.json): add groq llama3.2b model pricing
Closes https://github.com/BerriAI/litellm/issues/6807
* fix(utils.py): fix handling ollama response format param
Fixes https://github.com/BerriAI/litellm/issues/6848#issuecomment-2491215485
* docs(sidebars.js): refactor chat endpoint placement
* fix: fix linting errors
* test: fix test
* test: fix test
* fix(openai_like/handler): handle max retries
* fix(streaming_handler.py): fix streaming check for openai-compatible providers
* test: update test
* test: correctly handle model is overloaded error
* test: update test
* test: fix test
* test: mark flaky test
---------
Co-authored-by: Guowang Li <Guowang@users.noreply.github.com>