Files
litellm/tests/logging_callback_tests
ishaan-berri 8a9faa81b2 feat(guardrails): LLM-as-a-Judge guardrail (#26360)
* feat(guardrails): add LLM_AS_A_JUDGE to SupportedGuardrailIntegrations

* feat(types): add EvalVerdict, StandardLoggingEvalInformation; wire eval_information into SpendLogsMetadata

* feat(guardrails): add self-contained llm_as_a_judge guardrail hook

* fix(a2a): filter agent-only litellm_params from acompletion kwargs; pass agent_id into body

* feat(ui): add LLMJudgeFields criteria builder component

* feat(ui): wire LLM-as-a-Judge into add guardrail form

* feat(ui): update EvalViewer — title 'LLM Judge Results', weighted score column, summary row

* fix(ui): wire EvalViewer into LogDetailContent to show LLM judge results on logs page

* fix(guardrails-ui): route llm_as_a_judge to criteria builder step; rename to LiteLLM LLM as a Judge; add litellm logo

* fix(guardrail-viewer): stack lifecycle + eval details vertically to avoid badge overflow in narrow drawer

* fix(guardrail-create): surface config validation errors on create instead of silently orphaning guardrail in DB

* fix(guardrail-registry): hardcode llm_as_a_judge in initializer registry so it loads regardless of package install path

* fix(llm-as-a-judge): fix P1 code quality issues - validate weights/on_failure, guard pre_call, handle multimodal, move imports to module level, fix spurious finally logging

* fix(guardrail_endpoints): use correct PK field in rollback delete and log rollback failure

* fix(llm_as_a_judge): support Pydantic object in _get_litellm_param fallback chain

* fix(LLMJudgeFields): replace @tremor/react Button with antd Button

* fix(llm_as_a_judge): remove dead registry dicts, fix KeyError in prompt builder, set correct status on judge failure

* test(llm_as_a_judge): add unit tests for guardrail hook

* fix(llm_as_a_judge): remove @log_guardrail_information decorator to fix duplicate guardrail_information entries

The decorator and the manual finally block both called add_standard_logging_guardrail_information_to_request_data, producing two entries per request. The decorator also misclassified HTTPException(422) blocks as guardrail_failed_to_respond (it checks for 400). The finally block correctly tracks status throughout, so removing the decorator is sufficient.

* fix(test_gcs_pub_sub): ignore metadata.eval_information in comparison

* fix(test_spend_management): ignore metadata.eval_information in payload comparison

* fix(types/guardrails): add input_type and messages to ApplyGuardrailRequest

* fix(guardrail_endpoints): pass input_type and messages through apply_guardrail endpoint

* fix(guardrail_endpoints): auto-detect post_call guardrails and use input_type=response

* fix(a2a_endpoints): merge agent litellm_params guardrails into data before post_call hooks

* fix(llm_as_a_judge): use float sum with tolerance for weight validation

* fix(guardrail_registry): split long import line for black formatting

* fix(llm_as_a_judge): guard guardrail_name Optional for mypy

* fix(llm_as_a_judge): set guardrail_status=guardrail_intervened when score fails, regardless of on_failure mode

* fix(a2a_endpoints): use try/finally so deferred spend log fires even when guardrail blocks with 422

* fix(litellm_logging): declare _defer_async_logging and _enqueue_deferred_logging on Logging class for mypy

* fix(logging_worker): restore queue.join() in flush() to wait for in-flight callbacks
2026-04-24 17:15:32 -07:00
..
2026-03-30 16:59:27 -07:00
2026-03-28 19:17:38 -07:00
2026-03-15 00:58:08 +05:30