ai: bump reviewer max_tokens 120 → 300

A live sanity-check on 50 recent IndicatorSummary rows found 6 of 10 reviewer rejections were the reviewer hitting its own max_tokens cap mid-verdict ('{"clean": false, "reason": "Truncated sent…'). The parser then dropped the candidate as malformed JSON, producing a false-negative verdict that would have purged legitimately clean rows. 300 tokens is well above the ~30-token verdict the prompt asks for; the extra headroom removes the artefact at ~$0.00015 per call. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 13:15:42 +02:00 · 2026-05-29 13:15:42 +02:00 · 0550063316
commit 0550063316
parent 45fa31bb2b
1 changed files with 9 additions and 1 deletions
--- a/app/services/output_review.py
+++ b/app/services/output_review.py
@ -84,7 +84,15 @@ async def review_read(client: httpx.AsyncClient, candidate: str) -> Verdict:
    try:
        result = await call_llm(
            client, messages,
-            max_tokens=120,
+            # 300 tokens is comfortably above the 30-token JSON verdict
            # the prompt asks for. An earlier 120-token cap was producing
            # frequent finish_reason=length cutoffs that left the JSON
            # half-written ('{"clean": false, "reason": "Text…'), which
            # the parser then rejected as malformed — a false-negative
            # in the verdict. The extra headroom costs ~$0.00015 per
            # call (DeepSeek output rates) and removes that whole class
            # of artefact.
            max_tokens=300,
            response_format={"type": "json_object"},
        )
    except Exception as e: