llm: support JSON-mode + stop publishing the reasoning field

Two changes to the LLM call path that together close the chain-of-thought leakage surface: 1. _call_provider accepts an optional `response_format` (forwarded to the OpenAI-shaped API — DeepSeek and OpenRouter both honour {"type": "json_object"}). Threaded through call_llm so callers can force structured output without monkey-patching the body. The indicator-summary job will use this next: it'll require the model to emit {"read": "..."} and parse the field, making prose outside the JSON object physically impossible to publish. 2. Empty `content` no longer falls back to the `reasoning` field. `reasoning` is the model's internal scratchpad — "Let's see...", half-formed math, planning notes. We had a fallback that surfaced it when content was null, but the field is intended for debugging the model, not for publication. After the 2026-05-29 valuation read leaked into production, the fallback is gone: an empty content row now raises so the caller retries or skips, and the previous good row remains visible. Test updated to assert this safer behaviour. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 13:02:36 +02:00 · 2026-05-29 13:02:36 +02:00 · 19d4854f50
commit 19d4854f50
parent 8347c90235
2 changed files with 36 additions and 19 deletions
--- a/tests/test_openrouter_transport.py
+++ b/tests/test_openrouter_transport.py
@ -183,10 +183,12 @@ async def test_call_llm_uses_upstream_cost_when_provided(monkeypatch):


@pytest.mark.asyncio
-async def test_call_llm_falls_back_to_reasoning_field_when_content_null(monkeypatch):
-    """Thinking models sometimes return null `content` plus a populated
-    `reasoning` block — we surface the reasoning so the caller still gets
-    something usable rather than treating the row as empty."""
+async def test_call_llm_does_not_publish_reasoning_when_content_null(monkeypatch):
+    """The `reasoning` field is the model's internal chain-of-thought
+    (scratchpad: "Let's see…", planning notes, half-formed math). It is
+    never safe to surface as the user-facing answer — see the
+    2026-05-29 valuation-read leak. If `content` is null we treat the
+    row as a generation failure and raise; the caller can retry or skip."""
    _configure(monkeypatch, DEEPSEEK_API_KEY="sk-d", LLM_FALLBACK="")

    def handler(request: httpx.Request) -> httpx.Response:
@ -199,8 +201,8 @@ async def test_call_llm_falls_back_to_reasoning_field_when_content_null(monkeypa
        })

    async with httpx.AsyncClient(transport=_mock_post(handler)) as client:
-        result = await ot.call_llm(client, [{"role": "user", "content": "hi"}])
-    assert result.content == "deep thought"
+        with pytest.raises(RuntimeError, match="LLM returned empty content"):
+            await ot.call_llm(client, [{"role": "user", "content": "hi"}])


@pytest.mark.asyncio
@ -228,7 +230,7 @@ async def test_call_llm_falls_back_to_secondary_when_primary_raises(monkeypatch)
        prompt_tokens=1, completion_tokens=2, cost_usd=0.0,
    )

-    async def fake(_client, provider, _messages, _model, _max_tokens):
+    async def fake(_client, provider, _messages, _model, _max_tokens, response_format=None):
        calls.append(provider)
        if provider == "deepseek":
            raise RuntimeError("primary down")
@ -247,7 +249,7 @@ async def test_call_llm_raises_last_exception_when_chain_exhausted(monkeypatch):
    _configure(monkeypatch,
               DEEPSEEK_API_KEY="sk-d", OPENROUTER_API_KEY="sk-or")

-    async def fake(_client, provider, _messages, _model, _max_tokens):
+    async def fake(_client, provider, _messages, _model, _max_tokens, response_format=None):
        raise RuntimeError(f"{provider} broken")

    with patch.object(ot, "_call_provider", fake):