mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 22:48:35 +00:00
8a9faa81b2
* feat(guardrails): add LLM_AS_A_JUDGE to SupportedGuardrailIntegrations * feat(types): add EvalVerdict, StandardLoggingEvalInformation; wire eval_information into SpendLogsMetadata * feat(guardrails): add self-contained llm_as_a_judge guardrail hook * fix(a2a): filter agent-only litellm_params from acompletion kwargs; pass agent_id into body * feat(ui): add LLMJudgeFields criteria builder component * feat(ui): wire LLM-as-a-Judge into add guardrail form * feat(ui): update EvalViewer — title 'LLM Judge Results', weighted score column, summary row * fix(ui): wire EvalViewer into LogDetailContent to show LLM judge results on logs page * fix(guardrails-ui): route llm_as_a_judge to criteria builder step; rename to LiteLLM LLM as a Judge; add litellm logo * fix(guardrail-viewer): stack lifecycle + eval details vertically to avoid badge overflow in narrow drawer * fix(guardrail-create): surface config validation errors on create instead of silently orphaning guardrail in DB * fix(guardrail-registry): hardcode llm_as_a_judge in initializer registry so it loads regardless of package install path * fix(llm-as-a-judge): fix P1 code quality issues - validate weights/on_failure, guard pre_call, handle multimodal, move imports to module level, fix spurious finally logging * fix(guardrail_endpoints): use correct PK field in rollback delete and log rollback failure * fix(llm_as_a_judge): support Pydantic object in _get_litellm_param fallback chain * fix(LLMJudgeFields): replace @tremor/react Button with antd Button * fix(llm_as_a_judge): remove dead registry dicts, fix KeyError in prompt builder, set correct status on judge failure * test(llm_as_a_judge): add unit tests for guardrail hook * fix(llm_as_a_judge): remove @log_guardrail_information decorator to fix duplicate guardrail_information entries The decorator and the manual finally block both called add_standard_logging_guardrail_information_to_request_data, producing two entries per request. The decorator also misclassified HTTPException(422) blocks as guardrail_failed_to_respond (it checks for 400). The finally block correctly tracks status throughout, so removing the decorator is sufficient. * fix(test_gcs_pub_sub): ignore metadata.eval_information in comparison * fix(test_spend_management): ignore metadata.eval_information in payload comparison * fix(types/guardrails): add input_type and messages to ApplyGuardrailRequest * fix(guardrail_endpoints): pass input_type and messages through apply_guardrail endpoint * fix(guardrail_endpoints): auto-detect post_call guardrails and use input_type=response * fix(a2a_endpoints): merge agent litellm_params guardrails into data before post_call hooks * fix(llm_as_a_judge): use float sum with tolerance for weight validation * fix(guardrail_registry): split long import line for black formatting * fix(llm_as_a_judge): guard guardrail_name Optional for mypy * fix(llm_as_a_judge): set guardrail_status=guardrail_intervened when score fails, regardless of on_failure mode * fix(a2a_endpoints): use try/finally so deferred spend log fires even when guardrail blocks with 422 * fix(litellm_logging): declare _defer_async_logging and _enqueue_deferred_logging on Logging class for mypy * fix(logging_worker): restore queue.join() in flush() to wait for in-flight callbacks