Extends the reviewer agent — previously only protecting indicator
summaries — to every AI-generated surface that reaches a user. The
reviewer's prompt already rejects scratchpad, truncation,
meta-commentary, and (since a6e476b) financial advice; wiring it in
turns those rules from prompt-level "asks" into structural gates.
Four call sites updated:
- ai_log_job.run() : after each tone/analysis variant is generated,
pass through review_read. On reject, log the reason and skip the
StrategicLog insert; the API's existing "latest StrategicLog" lookup
falls back to the previous clean log.
- services/portfolio_analysis.analyse() : on reject, raise a clean
RuntimeError that the /api/analyze router already maps to HTTP 502
with a retry-able message. Portfolio analysis isn't cached server-
side, so the user retries; the reviewer's verdict reason goes into
the AICall ledger as the leaked-status row's error column.
- routers/chat.chat() : on reject, instead of returning the raw
assistant content we return a short refusal explaining the limit
and inviting a rephrase. Adds ~1-2 s of latency per turn (one extra
LLM call to Haiku) — the only user-facing latency tax.
- jobs/email_digest_job._generate_variants() : on reject, the variant
is dropped for the cycle. Recipients on the rejected tone get no
digest email this run, which is better than delivering inbox copy
that drifts into advice (emails are unrecallable once sent).
In every case the AICall ledger row records the reviewer cost so
month_spend stays accurate across all paths.
The reviewer system prompt is slightly generalised to cover both the
indicator-summary case and the longer-form log/digest/chat case:
- removes "short interpretive read" framing
- softens the "any question" rule so genuine rhetorical structure in
a long-form log doesn't trigger a reject
tests/conftest.py grows an autouse fixture that stubs review_read to
clean=True in every consumer module. Tests that mock the generator
shouldn't have to also mock the safety gate behind it; tests that
specifically want the reject branch can override with their own
monkeypatch. test_output_review.py is unaffected — it imports
review_read directly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
88 lines
2.9 KiB
Python
88 lines
2.9 KiB
Python
"""Pytest config — no DB / no network. Tests target pure functions only.
|
|
|
|
Heavy runtime deps (fastapi, httpx, sqlalchemy, pydantic-settings, tenacity)
|
|
are installed inside the container but not necessarily on the host. Tests
|
|
that need them use pytest.importorskip; the full suite runs via
|
|
`docker compose run --rm app pytest tests/`."""
|
|
from __future__ import annotations
|
|
|
|
import os
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
|
|
ROOT = Path(__file__).resolve().parent.parent
|
|
sys.path.insert(0, str(ROOT))
|
|
|
|
# Sentinel env so importing app.config doesn't try to read a missing .env.
|
|
os.environ.setdefault("DATABASE_URL", "sqlite+aiosqlite:///:memory:")
|
|
os.environ.setdefault("CASSANDRA_MOCK", "1")
|
|
|
|
|
|
import pytest
|
|
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def stub_reviewer(monkeypatch):
|
|
"""Replace review_read with a clean-passing stub in every consumer
|
|
module. Tests that mock the generator's call_llm shouldn't also
|
|
have to mock the reviewer that runs after it — the reviewer is a
|
|
safety gate, not behaviour under test.
|
|
|
|
Tests in test_output_review.py exercise review_read through its
|
|
own module and are unaffected. Tests that want to assert the
|
|
reviewer-rejected branch can override with their own
|
|
monkeypatch.setattr — later wins.
|
|
"""
|
|
from app.services.output_review import Verdict
|
|
|
|
async def _clean(_client, _candidate):
|
|
return Verdict(clean=True, reason="stubbed-by-conftest", cost_usd=0.0)
|
|
|
|
for mod_path in (
|
|
"app.services.portfolio_analysis",
|
|
"app.routers.chat",
|
|
"app.jobs.ai_log_job",
|
|
"app.jobs.email_digest_job",
|
|
"app.jobs.indicator_summary_job",
|
|
):
|
|
try:
|
|
mod = __import__(mod_path, fromlist=["review_read"])
|
|
except ImportError:
|
|
continue
|
|
if hasattr(mod, "review_read"):
|
|
monkeypatch.setattr(mod, "review_read", _clean)
|
|
|
|
|
|
@pytest.fixture
|
|
async def db_factory(tmp_path):
|
|
"""Per-test sqlite engine + async session factory.
|
|
|
|
Creates a fresh sqlite database file under ``tmp_path``, applies
|
|
``Base.metadata.create_all``, and rebinds ``app.db._engine`` /
|
|
``app.db._session_factory`` so module-level helpers (which look
|
|
these up at call time) see the test engine.
|
|
|
|
Yields the ``async_sessionmaker``. Tests use it like:
|
|
|
|
async def test_foo(db_factory):
|
|
async with db_factory() as session:
|
|
...
|
|
"""
|
|
from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
|
|
|
|
from app import db as db_mod
|
|
from app.db import Base
|
|
import app.models # noqa: F401 — registers models on Base.metadata
|
|
|
|
engine = create_async_engine(f"sqlite+aiosqlite:///{tmp_path}/test.db")
|
|
factory = async_sessionmaker(engine, expire_on_commit=False)
|
|
db_mod._engine = engine
|
|
db_mod._session_factory = factory
|
|
|
|
async with engine.begin() as conn:
|
|
await conn.run_sync(Base.metadata.create_all)
|
|
|
|
yield factory
|
|
|
|
await engine.dispose()
|