ai: bump reviewer max_tokens 300 → 800

Live re-check on 50 recent IndicatorSummary rows after the previous 120 → 300 bump still produced 4 'reviewer returned non-JSON' verdicts out of 12 rejections. DeepSeek-V4-flash sometimes prefixes its JSON output with a short stretch of thinking even though response_format is enforced, which truncates the JSON at the back end of the 300-token cap. 800 tokens is comfortably above any realistic verdict + preamble at ~$0.00022/call (DeepSeek output rates). Negligible cost given the hourly call volume. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 13:16:57 +02:00 · 2026-05-29 13:16:57 +02:00 · 8b9d3c9c3e
commit 8b9d3c9c3e
parent 0550063316
1 changed files with 8 additions and 9 deletions
--- a/app/services/output_review.py
+++ b/app/services/output_review.py
@ -84,15 +84,14 @@ async def review_read(client: httpx.AsyncClient, candidate: str) -> Verdict:
    try:
        result = await call_llm(
            client, messages,
-            # 300 tokens is comfortably above the 30-token JSON verdict
-            # the prompt asks for. An earlier 120-token cap was producing
-            # frequent finish_reason=length cutoffs that left the JSON
-            # half-written ('{"clean": false, "reason": "Text…'), which
-            # the parser then rejected as malformed — a false-negative
-            # in the verdict. The extra headroom costs ~$0.00015 per
-            # call (DeepSeek output rates) and removes that whole class
-            # of artefact.
-            max_tokens=300,
+            # 800 tokens is well above the ~30-token JSON verdict the
+            # prompt asks for. The reviewer model (DeepSeek-V4-flash)
+            # occasionally pads with its own thinking before the JSON
+            # even though response_format is enforced; smaller caps
+            # (120, 300) produced finish_reason=length cutoffs that
+            # left the JSON half-written and broke the parser. 800
+            # removes the artefact entirely at ~$0.00022 per call.
+            max_tokens=800,
            response_format={"type": "json_object"},
        )
    except Exception as e: