review: gate strategic-log, portfolio, chat, and digest on reviewer

Extends the reviewer agent — previously only protecting indicator
summaries — to every AI-generated surface that reaches a user. The
reviewer's prompt already rejects scratchpad, truncation,
meta-commentary, and (since a6e476b) financial advice; wiring it in
turns those rules from prompt-level "asks" into structural gates.

Four call sites updated:

- ai_log_job.run() : after each tone/analysis variant is generated,
  pass through review_read. On reject, log the reason and skip the
  StrategicLog insert; the API's existing "latest StrategicLog" lookup
  falls back to the previous clean log.

- services/portfolio_analysis.analyse() : on reject, raise a clean
  RuntimeError that the /api/analyze router already maps to HTTP 502
  with a retry-able message. Portfolio analysis isn't cached server-
  side, so the user retries; the reviewer's verdict reason goes into
  the AICall ledger as the leaked-status row's error column.

- routers/chat.chat() : on reject, instead of returning the raw
  assistant content we return a short refusal explaining the limit
  and inviting a rephrase. Adds ~1-2 s of latency per turn (one extra
  LLM call to Haiku) — the only user-facing latency tax.

- jobs/email_digest_job._generate_variants() : on reject, the variant
  is dropped for the cycle. Recipients on the rejected tone get no
  digest email this run, which is better than delivering inbox copy
  that drifts into advice (emails are unrecallable once sent).

In every case the AICall ledger row records the reviewer cost so
month_spend stays accurate across all paths.

The reviewer system prompt is slightly generalised to cover both the
indicator-summary case and the longer-form log/digest/chat case:
- removes "short interpretive read" framing
- softens the "any question" rule so genuine rhetorical structure in
  a long-form log doesn't trigger a reject

tests/conftest.py grows an autouse fixture that stubs review_read to
clean=True in every consumer module. Tests that mock the generator
shouldn't have to also mock the safety gate behind it; tests that
specifically want the reject branch can override with their own
monkeypatch. test_output_review.py is unaffected — it imports
review_read directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Giorgio Gilestro 2026-05-29 14:40:04 +02:00
parent a6e476b851
commit f9534f7ad6
6 changed files with 161 additions and 19 deletions

View file

@ -33,6 +33,7 @@ from app.logging import get_logger
from app.models import AICall
from app.services.i18n import LANGUAGES, respond_in_clause
from app.services.llm_prompts import build_system_prompt
from app.services.output_review import review_read
from app.services.openrouter import (
LogResult,
active_model,
@ -322,6 +323,8 @@ async def analyse(
s = get_settings()
system, user = build_prompt(req)
review_cost = 0.0
review_reason: str | None = None
async with httpx.AsyncClient() as client:
try:
llm: LogResult = await call_llm(
@ -340,15 +343,31 @@ async def analyse(
llm = None
log.error("portfolio_analysis.failed", error=error_msg)
# Reviewer gate. This is the highest-risk surface — the model is
# commenting on a real user's holdings, so any drift into
# buy/sell or allocation language is a regulatory hazard. Drop
# the response on a reject and surface a retry-able error to the
# caller; no analysis is ever persisted server-side anyway.
if llm is not None:
verdict = await review_read(client, llm.content)
review_cost = verdict.cost_usd or 0.0
if not verdict.clean:
status = "leaked"
error_msg = f"reviewer rejected: {verdict.reason}"
review_reason = verdict.reason
log.warning("portfolio_analysis.reviewer_rejected",
reason=verdict.reason, preview=llm.content[:120])
full_cost = ((llm.cost_usd or 0.0) + review_cost) if llm else None
# Ledger row — NO portfolio data, just metadata. Same row whether the
# call succeeded or failed, so cost-cap and rate-limit logic can
# observe the attempt.
# call succeeded, failed, or was rejected by the reviewer, so
# cost-cap and rate-limit logic can observe the attempt.
session.add(AICall(
called_at=utcnow(),
model=llm.model if llm else active_model(),
prompt_tokens=llm.prompt_tokens if llm else None,
completion_tokens=llm.completion_tokens if llm else None,
cost_usd=llm.cost_usd if llm else None,
cost_usd=full_cost,
status=status,
error=error_msg,
))
@ -356,19 +375,26 @@ async def analyse(
if llm is None:
raise RuntimeError(error_msg or "portfolio analysis failed")
if review_reason is not None:
# Reviewer rejected the candidate. Treat as a generation failure
# at the API layer so the user sees a retry-able error rather
# than potentially non-compliant advice.
raise RuntimeError(
"AI analysis couldn't be generated cleanly — please try again."
)
log.info(
"portfolio_analysis.ok",
n_positions=len(req.positions),
prompt_tokens=llm.prompt_tokens,
completion_tokens=llm.completion_tokens,
cost_usd=llm.cost_usd,
cost_usd=full_cost,
)
return AnalysisResult(
content=llm.content,
model=llm.model,
prompt_tokens=llm.prompt_tokens,
completion_tokens=llm.completion_tokens,
cost_usd=llm.cost_usd,
cost_usd=full_cost,
generated_at=datetime.now(timezone.utc),
)