mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 14:48:44 +00:00
70492cee42
* feat(proxy): add /v1/memory CRUD endpoints with user/team scoping
New LiteLLM_MemoryTable stores user/team-scoped key/value entries with
optional JSON metadata. Value is a String (LLM-readable text) and metadata
is an optional Json? envelope, matching the Letta + mem0 hybrid model so
future structured fields can be added without a schema migration.
Endpoints:
POST /v1/memory - create
GET /v1/memory - list (caller-scoped; admins see all)
GET /v1/memory/{key} - fetch one
PUT /v1/memory/{key} - upsert
DELETE /v1/memory/{key} - delete
Non-admin callers cannot set a user_id/team_id other than their own.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(proxy/memory): omit metadata field when None on create
Prisma's Python client rejects `metadata=None` on a `Json?` field with
"A value is required but not set" — the field must be omitted from the
`data` dict entirely to store SQL NULL. Build the create payload
conditionally in both `create_memory` and the PUT-create branch of
`upsert_memory`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(ui): add Memory page to view/manage /v1/memory entries
Adds a new "Memory" sidebar item under Tools so users can see what their
agents have stored. Lists all memories visible to the caller (scoped by
the backend), with a key-search filter, preview column, scope tags, and
view/edit/delete actions. Create modal accepts optional JSON metadata.
- networking.tsx: fetchMemoryList / createMemory / updateMemory / deleteMemory
wired to the /v1/memory CRUD endpoints.
- MemoryView + MemoryEditModal: new antd-based components (per CLAUDE.md:
use antd for new UI, not tremor).
- page.tsx + leftnav.tsx: wire the "memory" route + sidebar entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(memory): add key_prefix filter + promote Memory to AI GATEWAY nav
Backend:
- GET /v1/memory now accepts `key_prefix` for Redis-style namespace
scans (e.g. `?key_prefix=user:`). When both `key` and `key_prefix`
are passed, `key_prefix` wins.
- Prefix filter sits under the visibility filter in the Prisma where
clause, so it can never leak rows across user/team scopes.
- New tests: prefix match, and cross-scope isolation (another user's
`user:*` rows must not appear in the caller's results).
UI:
- Memory moved from a Tools submenu to a top-level AI GATEWAY item
(alongside Agents, MCP Servers, Skills) — it's an API primitive,
not a tool-management surface.
- Search box now drives prefix search, matching the Redis mental
model ("type the namespace, see everything under it").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): enforce unique key per scope by using NULLS NOT DISTINCT
The unique constraint `(key, user_id, team_id)` on LiteLLM_MemoryTable
silently allowed duplicates when user_id or team_id was NULL, because
Postgres treats every NULL as distinct by default (ANSI semantics). A
caller with no team_id could POST the same key three times and get
three rows.
Migration:
1. Dedupe existing rows, keeping the most recent per (key, user_id,
team_id), using `IS NOT DISTINCT FROM` so NULL == NULL.
2. Drop the old unique index.
3. Recreate it with `NULLS NOT DISTINCT` (Postgres 15+).
No code change: POST already returns 409 on unique-violation error
messages — it just wasn't firing before because the constraint didn't
catch the NULL-team case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): make key globally unique, 409 on any duplicate
Switches from the compound unique `(key, user_id, team_id)` to a simple
`key @unique`. The compound form silently allowed duplicates when
user_id or team_id was NULL (Postgres treats each NULL as distinct), so
callers could POST the same key repeatedly. Globally-unique key means
one row per key, period — any duplicate create → 409.
- schema.prisma (×3): `key String @unique`, drop `@@unique(...)`.
- initial add_memory_table migration: unique index on (key) only.
- Remove the now-unused follow-up NULLS NOT DISTINCT migration.
- Endpoint error message simplified ("already exists" — no "for this scope").
- Test fake's create() now enforces global key uniqueness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): full-width layout + user/teams-style columns
- Add `w-full` to the MemoryView outer div so the page fills the
flex-flex-1 container (was collapsing to intrinsic width).
- Replace the combined "Scope" column with separate User ID / Team ID
columns, matching the layout of the Users / Teams pages: ID, Name,
Preview, User ID, Team ID, Updated, Actions.
- IDs render with a truncated mono label + copy-to-clipboard button,
same pattern as view_users.
- Detail drawer now shows Memory ID / User ID / Team ID as separate
fields instead of stacked color tags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): use clean MCP-style ID pill, drop copy icons
The ID / User ID / Team ID columns showed a mono text blob with a
copy-to-clipboard icon next to each value — too busy compared to the
MCP Servers page. Swap the renderer for MCP's pill style:
- Truncated mono ID inside a blue Tailwind pill
(`font-mono text-blue-600 bg-blue-50 ... rounded-md border`).
- No copy icon. Full ID surfaces via tooltip.
- ID column is a button that opens the detail drawer on click;
user/team ID pills are static (not clickable).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): address greptile review feedback
Addresses 5 greptile findings (3/5 → higher confidence target):
1. Identity-less orphan rows (P1): non-admin callers with no user_id AND
no team_id could create rows that the visibility filter would never
match again. Now rejected up front with 400 — caller must authenticate
with a scoped key or act as PROXY_ADMIN.
2. Upsert race returning 500 (P1): PUT's check-then-create isn't atomic;
a concurrent writer could slip a row in between the 404-check and the
create call. Now catch unique-violation on create, re-read, and fall
through to update — PUT stays idempotent. If the conflicting row
belongs to a different scope, surface a 409 instead of 500.
3. PUT-create scope inconsistency (P2): PUT's create branch always used
the caller's own user_id/team_id, so admins couldn't bootstrap rows
scoped elsewhere via PUT (only POST). Now PUT-create calls the shared
`_resolve_scope()` helper, matching POST semantics.
4. Stale schema comment (P2): schema said "Keyed by (key, user_id,
team_id)" but `key` is globally unique. Updated all three schema
copies to reflect the actual design.
5. UI silently truncated at 200 (P2): MemoryView fetched pageSize=200
with no load-more. Swapped to real server-side pagination driven by
`data.total`; page size is now 50 and the pager is a real AntD
control.
Also extracts a shared `_resolve_scope()` helper and `_is_unique_violation()`
from create_memory so POST and PUT don't drift on the scope/error logic.
Tests: +3 new (identity-less 400, PUT admin bootstrap, PUT race →
update), 18/18 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): typed Prisma error + explicit-null metadata on PUT
Two more greptile threads from the last review:
- Unique-violation detection was string-matching "Unique"/"UniqueViolation"
in the exception message, fragile across Prisma/driver versions. Now
check the typed error `code == "P2002"` first, with string fallback.
- PUT could not distinguish "metadata omitted" from "metadata: null" —
both parsed as `None`, so callers had no way to clear stored metadata.
Switch to Pydantic v2's `model_fields_set` to tell which fields the
caller actually sent; explicit null now clears the column.
New tests:
- explicit null clears metadata
- omitted metadata preserves existing value
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): send explicit null when user clears metadata
Addresses the remaining P1 from the last greptile review:
When the edit modal's metadata textarea was cleared and saved,
`metadataParsed` stayed `undefined`, `JSON.stringify` dropped the key
entirely, and the backend's `model_fields_set` guard therefore left
the stored metadata untouched — UI showed success but nothing changed.
Now: empty textarea on edit → send explicit `null` so the backend
sees `metadata` in `model_fields_set` and clears the column.
Empty textarea on create still maps to `undefined` (field omitted)
to avoid Prisma's `Json? = None` quirk on insert.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): preserve slashes in key path encoding
The backend route `/v1/memory/{key:path}` supports keys with slashes,
but `encodeURIComponent` encoded `/` as `%2F`. Some proxies (nginx
default, CloudFlare, AWS ALB) reject or re-decode `%2F` mid-flight,
so UI update/delete calls on slash-containing keys could fail or
silently misroute.
New helper `encodeMemoryKeyForPath` splits by `/`, URL-encodes each
segment, then rejoins with literal `/`. Every other unsafe char
(spaces, `?`, `#`, `%`) stays encoded per-segment; slashes stay as
path delimiters, matching what the `:path` converter expects.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): drop misleading client-side column sorters
With server-side pagination, client sorters on `key` and `updated_at`
only reorder the current page while pretending to sort the full
dataset — users would see "sorted by name" but only the visible 50
rows would actually be sorted.
Remove the sorters. The backend already returns rows in
`updated_at DESC` order (sensible default for a memory view), and
users can narrow the result with the key-prefix filter.
Greptile also flagged missing `@@map` on the new model as a
"consistency" issue, but only 1 of 59 tables in this repo uses
`@@map` — the dominant pattern is to rely on Prisma's default
(model name == table name). Skipping that finding as a
false-positive on convention.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): compose visibility + key filters via explicit AND
Greptile P1 (filter-fragility): `where.update(vis)` was semantically
correct today, but dict-merging by key meant any future visibility
filter that grew a new top-level "OR" would silently clobber the
existing key filter.
Compose explicitly instead:
where = {"AND": [key_filter, vis]}
Applied to both `list_memory` and `_find_memory_for_caller`. When
either side is empty (admin has no visibility filter; list has no
key filter), skip the wrapper and use the non-empty side directly
to keep the generated SQL clean.
Test fake's `_matches` now understands top-level `AND` too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(ui/memory): wrap write helpers with react-query useMutation
Previously the Memory view read via `useQuery` but called the raw
create/update/delete fetch helpers directly in handlers, tracking
loading state with a local `submitting` flag and invalidating state
via `refetch()`. That mixes two concerns:
- it skips react-query's mutation state (isPending / isError / isSuccess)
- `refetch()` only retouches the currently-mounted query instance, not
other cached pages, so navigating back to an older page could show
stale rows
Switch the three write paths to `useMutation`:
- `createMutation`, `updateMutation`, `deleteMutation` — each owns
the mutation fn, success toast, and error toast.
- Success handlers invalidate the whole `["memoryList", ...]` prefix
via `queryClient.invalidateQueries`, so every cached page refetches
(pagination + filter-aware).
- Refresh button now invalidates instead of `refetch()`, keeping all
behavior consistent.
- handleSave/handleDelete become thin adapters that call `.mutateAsync`;
their errors are swallowed locally since the mutation's onError has
already surfaced the toast.
Also tightened the edit modal's key-field tooltip to reflect the
actual global-unique semantics (was "Unique per user/team scope").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): close cross-user write gap + sanitize 500 errors (Veria)
Addresses two Veria findings:
**High — cross-user memory tampering via team membership.** The
visibility filter uses an OR (`user_id == caller OR team_id == caller`)
so team members can SEE each other's team-scoped rows. That's
intentional for list/get. But because PUT/DELETE used the same filter
to find the target row, any team member could overwrite or delete a
teammate's *personal* row whenever both `user_id` and `team_id` were
stamped on it — broader visibility was being silently treated as
broader authority.
New `_assert_write_access(row, caller)` enforces ownership for
mutations. Non-admin rules:
- The row's `user_id` must match the caller (personal ownership), OR
- The row has no `user_id` and its `team_id` matches the caller's
team (a "pure team row" intended for shared writes).
Admins bypass the check. The same gate runs in PUT (both regular
and post-race-recovery branches) and DELETE.
**Medium — DB internals leaked through 500 detail.** Every `except`
block was raising `HTTPException(500, detail=str(e))`, which surfaces
Prisma error strings (table/column names, host:port, error class
names) to API callers. New `_internal_error()` helper logs the real
exception server-side and returns a generic, caller-safe `detail`.
Applied to create, list, upsert (general fallthrough), and delete.
Also tightened the race-recovery 409 message to drop the "in a
different scope" wording — the caller never needs to know whose
scope it lives in.
Tests (+5):
- teammate cannot overwrite personal row → 403
- teammate cannot delete personal row → 403
- teammate CAN modify pure team row (no user_id stamped) → 200
- admin bypasses write-auth → 200
- 500 response never echoes Prisma internals (table/host/class names)
25/25 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): require team admin to modify pure team rows
Tightens the write-authorization rule for "pure team rows" (rows with
no user_id stamped, only team_id) to match the pattern used by
team-management endpoints (`_is_user_team_admin` + `_is_user_org_admin_for_team`):
- Plain team members can READ team rows via the OR visibility filter
(intentional, unchanged).
- Only PROXY_ADMIN, team admins of the row's team_id, or org admins
for the team's organization may MODIFY them. Plain members get 403.
`_assert_write_access` is now async and takes the prisma_client so it
can fetch the team and run the existing `_is_user_team_admin` /
`_is_user_org_admin_for_team` helpers from
`litellm.proxy.management_endpoints.common_utils`. The org-admin path
is best-effort: it calls `get_user_object`, which depends on the
proxy_server module being initialized, so any exception there is
treated as "not an org admin" rather than crashing the request.
Tests:
- team admin can modify pure team row → 200
- plain team member cannot modify pure team row → 403
- plain team member cannot delete pure team row → 403
Updates the test fake to add a tiny `litellm_teamtable.find_unique`
implementation and a `_make_team(team_id, admin_user_ids=[...])`
helper.
27/27 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: mypy + UI page-metadata sync for memory page
Two CI failures:
1. mypy: `_find_memory_for_caller` had `key_filter` inferred as
`dict[str, str]` (literal type) and the conditional `{"AND": [key_filter, vis]}`
returned `dict[str, list[...]]`, so the join site failed
`dict-item` typing. Annotate both intermediates as `dict` so mypy
widens the value type.
2. UI test (`page_utils.test.ts > should have descriptions for all
pages`): every leftnav entry must have a description in
`page_metadata.ts`, and `memory` was missing. Added a one-line
description, matching the style of neighboring entries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [Feat] Day-0 support for GPT-5.5 and GPT-5.5 Pro (#26449)
* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro
Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:
- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
per 1M input/output/cached input
Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.
No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.
Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields
* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants
gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.
Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.
Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.
* fix(schema): close LiteLLM_MemoryTable model brace dropped during merge
The rebase against `litellm_internal_staging` (which added
`LiteLLM_AdaptiveRouterState` / `LiteLLM_AdaptiveRouterSession`) left
the closing brace of `LiteLLM_MemoryTable` missing in all three
schema copies — the next model declaration ended up parsed as a field
of the memory table, surfacing as the CI prisma error:
error: This line is not a valid field or attribute definition.
--> schema.prisma:1250
|
1249 | // Per-(router, request_type, model) Beta posterior for the adaptive router.
1250 | model LiteLLM_AdaptiveRouterState {
Add the missing `}` (and the standard blank line) after the memory
table's `@@index([team_id])` in `schema.prisma`,
`litellm/proxy/schema.prisma`, and
`litellm-proxy-extras/litellm_proxy_extras/schema.prisma`.
`prisma generate --schema litellm/proxy/schema.prisma` now runs clean;
27/27 memory unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
0 lines
0 B
Python
0 lines
0 B
Python
The file is empty.