ai: bump reviewer max_tokens 300 → 800

Live re-check on 50 recent IndicatorSummary rows after the previous
120 → 300 bump still produced 4 'reviewer returned non-JSON' verdicts
out of 12 rejections. DeepSeek-V4-flash sometimes prefixes its JSON
output with a short stretch of thinking even though response_format
is enforced, which truncates the JSON at the back end of the 300-token
cap.

800 tokens is comfortably above any realistic verdict + preamble at
~$0.00022/call (DeepSeek output rates). Negligible cost given the
hourly call volume.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Giorgio Gilestro 2026-05-29 13:16:57 +02:00
parent 0550063316
commit 8b9d3c9c3e

View file

@ -84,15 +84,14 @@ async def review_read(client: httpx.AsyncClient, candidate: str) -> Verdict:
try: try:
result = await call_llm( result = await call_llm(
client, messages, client, messages,
# 300 tokens is comfortably above the 30-token JSON verdict # 800 tokens is well above the ~30-token JSON verdict the
# the prompt asks for. An earlier 120-token cap was producing # prompt asks for. The reviewer model (DeepSeek-V4-flash)
# frequent finish_reason=length cutoffs that left the JSON # occasionally pads with its own thinking before the JSON
# half-written ('{"clean": false, "reason": "Text…'), which # even though response_format is enforced; smaller caps
# the parser then rejected as malformed — a false-negative # (120, 300) produced finish_reason=length cutoffs that
# in the verdict. The extra headroom costs ~$0.00015 per # left the JSON half-written and broke the parser. 800
# call (DeepSeek output rates) and removes that whole class # removes the artefact entirely at ~$0.00022 per call.
# of artefact. max_tokens=800,
max_tokens=300,
response_format={"type": "json_object"}, response_format={"type": "json_object"},
) )
except Exception as e: except Exception as e: