feat(audio_transcription): add NVIDIA Riva STT provider (#27185)

mirror of https://github.com/tiennm99/litellm.git synced 2026-08-03 10:23:28 +00:00

* feat(audio_transcription): add NVIDIA Riva STT provider

Adds nvidia_riva as a new audio transcription provider, supporting both
NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming.

- Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy,
  audioread fallback) so callers can send any common format.
- Maps OpenAI params: language (en -> en-US), response_format (text/json/
  verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets,
  word offsets converted ms -> s for verbose_json.
- Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted
  otherwise (SSL off by default), with explicit use_ssl override.
- gRPC errors wrapped via NvidiaRivaException -> litellm exception classes.
- Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client,
  soundfile, audioread, numpy).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(nvidia_riva): address PR review feedback

- handler: forward call-level `timeout` to streaming_response_generator
  (kwarg-detected via inspect for older riva-client compat) so a stalled
  Riva server cannot block the caller indefinitely.
- audio_utils: spill bytes to a tempfile before audioread.audio_open;
  most audioread backends (FFmpeg, GStreamer) require a real filesystem
  path and previously raised TypeError on BytesIO, breaking the mp3/m4a
  fallback path.
- audio_utils: prefer soxr / scipy.signal.resample_poly for resampling
  (anti-aliased polyphase) when installed, falling back to linear only
  as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples.
- transformation: bare `es` now maps to es-ES (Castilian) instead of
  es-US, matching BCP-47 conventions.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: trigger CI re-run [stabilize loop 1/3]

* Update litellm/llms/nvidia_riva/audio_transcription/transformation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* chore: trigger CI re-run [stabilize loop 1/3]

* fix code qa

* fix lint

* fix mypy

* fix mypy

* Fix NVIDIA Riva ASR service lookup

* Fix NVIDIA Riva transcription payload logging

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>

This commit is contained in:

Sameer Kankute

2026-05-05 17:17:51 -07:00

committed by

GitHub

co-authored by

Cursor

oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com>

greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

mateo-berri

parent 454ce5073f

commit e912e6d4ff

20 changed files with 2120 additions and 2 deletions

									
										tests/code_coverage_tests/liccheck.ini
									
		+1
		
												View File
												
				@@ -126,6 +126,7 @@ sentry_sdk: >=2.21.0 # Unknown license

				cryptography: >=43.0.1 # Unknown license

				tzdata: >=2025.1 # Unknown license

				urllib3: >=2.0.0 # MIT license - https://github.com/urllib3/urllib3

				audioread: >=3.0.1 # MIT license manually verified - https://github.com/beetbox/audioread

				python-dotenv: >=1.0.0 # Unknown license

				tiktoken: >=0.8.0 # Unknown license

				click: >=8.1.7 # Unknown license