Commit graph

9 commits

Author SHA1 Message Date
0060166d32 review: per-surface rider, loosen for portfolio commentary
Reviewer was rejecting legitimate IT portfolio analyses, citing
descriptive risk language as actionable advice:

  reason: "Allocation guidance throughout: 'concentrazione gestibile',
  'non eliminabile', 'bassa esposizione', 'va monitorato'. Treats
  portfolio construction as actionable."

These phrases describe portfolio state (manageable concentration,
non-eliminable risk, low exposure, warrants monitoring) without
directing the user to take action. They are exactly the kind of
prose a portfolio commentary surface is supposed to produce. The
reviewer's generic "no financial advice" rule is too broad here.

Add a `surface` parameter to review_read() with a per-surface rider
mechanism (_SURFACE_RIDERS). The "portfolio" rider:

- Lists DESCRIPTIVE phrasings that are EXPLICITLY permitted:
  attribute naming ("high concentration", "currency exposure"),
  thesis invalidation conditions, impersonal observations about a
  position's sensitivity.
- Tightens the reject list to EXPLICIT calls to action: imperative
  verbs aimed at the reader, "you should", "consider X-ing",
  specific allocation prescriptions, price-target predictions.

portfolio_analysis.analyse() now passes surface="portfolio". All
other reviewer call sites (indicator summary, log, chat, digest)
default to surface=None and keep the generic rules.

tests/conftest.py's autouse review_read stub picks up **_kw so
adding new keyword arguments to review_read doesn't keep breaking
the locale-integration tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 16:44:27 +02:00
d47b310898 portfolio: drop rational/irrational + system-temperature from prompt
The base build_system_prompt() bakes in two artefacts that read fine
in the daily strategic log but cause repeated reviewer rejections on
the portfolio surface:

- The "Rational vs irrational" framework, which the model translates
  into IT/ES/FR/DE variants ("Razionalmente / Irrazionalmente",
  "Razionale se / Irrazionale se", etc.). Haiku reads the parallel
  contrast lists as the author working through their reasoning on
  the page and rejects as scratchpad.
- The mandatory "System temperature: [label] — …" closing line,
  which Haiku correctly flags as meta-commentary on this surface
  (it has no narrative anchor in a portfolio read).

Both are wired into the base prompt and don't add value here. Drop
them explicitly via an "# DO NOT include in this surface" override
block in _SYSTEM_OVERRIDES. The portfolio read is just plain
declarative commentary on the holdings now — opening posture
sentence, 3-5 paragraphs on concentration / tilt / currency /
winners-losers / what would invalidate, end of story.

Reviewer's rational-vs-irrational structural-device carve-out (added
in de3a9bf) stays — strategic log, indicator summaries, and digest
emails still legitimately use that framing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 16:27:54 +02:00
8e7ea673ce analyze: bump max_tokens 2000 → 4000 for portfolio analysis
Logs (analyze.lang_resolved → portfolio_analysis.reviewer_rejected
chain on 2026-05-29) showed the lang directive was working — the
model was producing Italian — but the reviewer was rejecting every
response as truncated mid-word ("supera i mass", "INRG +8"). The
analyze endpoint then returns 502 and the frontend keeps showing
whatever stale English row was last cached in localStorage, so from
the user's POV the analysis "is still in English".

Same shape as the strategic-log translation cap we fixed earlier:
the prompt targets ~350 English words, IT runs ~25-35% longer in
tokens, and DeepSeek-V4-flash bills internal reasoning against the
same budget. At 2000 we ran out of room mid-sentence. 4000 is well
above the longest realistic Italian output; cost is bounded by
tokens actually emitted, not the cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 16:04:40 +02:00
13dd3a8330 i18n: prepend a strong language directive for portfolio + chat
Reports that portfolio AI analysis was coming back in English even
for IT-toggled users. Traced the chain (DB user.lang IS set to it,
router passes it into the payload, parse_request reads it, build_prompt
appends respond_in_clause), so the wiring is correct end-to-end. The
model was simply ignoring the single-sentence tail nudge: when the
system prompt is hundreds of lines of English and the user message
adds more English context, "Respond in Italian." at the end is easy
to drop on the floor.

Add a new services/i18n.language_directive_lead() that returns a
strong, explicit top-of-prompt block — "# LANGUAGE — write everything
in <X>" plus the verbatim-tickers-and-numbers carve-out — meant to
be PREPENDED so the model anchors on the target language before it
reads the bulk of the instructions. Combined with the existing tail
clause it's belt-and-suspenders: top + bottom of the prompt both
say "in this language".

Applied to portfolio_analysis.build_prompt() and chat.py — the two
surfaces that generate user-facing prose in real time (the strategic
log + indicator summaries get post-hoc translation via translate(),
so the directive isn't needed there).

Empty-string return for en / unknown lang means callers can wire
it in unconditionally; no extra plumbing in i18n callsites.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 15:21:00 +02:00
f9534f7ad6 review: gate strategic-log, portfolio, chat, and digest on reviewer
Extends the reviewer agent — previously only protecting indicator
summaries — to every AI-generated surface that reaches a user. The
reviewer's prompt already rejects scratchpad, truncation,
meta-commentary, and (since a6e476b) financial advice; wiring it in
turns those rules from prompt-level "asks" into structural gates.

Four call sites updated:

- ai_log_job.run() : after each tone/analysis variant is generated,
  pass through review_read. On reject, log the reason and skip the
  StrategicLog insert; the API's existing "latest StrategicLog" lookup
  falls back to the previous clean log.

- services/portfolio_analysis.analyse() : on reject, raise a clean
  RuntimeError that the /api/analyze router already maps to HTTP 502
  with a retry-able message. Portfolio analysis isn't cached server-
  side, so the user retries; the reviewer's verdict reason goes into
  the AICall ledger as the leaked-status row's error column.

- routers/chat.chat() : on reject, instead of returning the raw
  assistant content we return a short refusal explaining the limit
  and inviting a rephrase. Adds ~1-2 s of latency per turn (one extra
  LLM call to Haiku) — the only user-facing latency tax.

- jobs/email_digest_job._generate_variants() : on reject, the variant
  is dropped for the cycle. Recipients on the rejected tone get no
  digest email this run, which is better than delivering inbox copy
  that drifts into advice (emails are unrecallable once sent).

In every case the AICall ledger row records the reviewer cost so
month_spend stays accurate across all paths.

The reviewer system prompt is slightly generalised to cover both the
indicator-summary case and the longer-form log/digest/chat case:
- removes "short interpretive read" framing
- softens the "any question" rule so genuine rhetorical structure in
  a long-form log doesn't trigger a reject

tests/conftest.py grows an autouse fixture that stubs review_read to
clean=True in every consumer module. Tests that mock the generator
shouldn't have to also mock the safety gate behind it; tests that
specifically want the reject branch can override with their own
monkeypatch. test_output_review.py is unaffected — it imports
review_read directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 14:40:04 +02:00
4adc8dfe82 openrouter: split into llm_prompts (prompt engineering) + transport
openrouter.py was 790 lines mixing two orthogonal concerns:
- Prompt engineering (build_system_prompt, build_summary_*,
  build_chat_*, build_daily_digest_*, etc.) — ~400 lines, changes
  weekly as PROMPT_VERSION bumps
- LLM transport (call_llm, _provider_chain, _call_provider, retry
  + fallback machinery) — ~250 lines, rarely changes

Extracted the prompt-engineering surface to app/services/llm_prompts.py.
Transport stays in openrouter.py (consistent with the filename — the
OpenRouter URL is the transport's anchor).

All import sites (jobs, routers, services, tests) split their
multi-import lines into two: prompt-things from llm_prompts, transport
from openrouter. PROMPT_VERSION constant, _TONE_ALIASES, _resolve_tone,
and SYSTEM_PROMPT moved with the prompt functions.

No behaviour change — pure relocation. Function signatures, body, and
naming all preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 21:27:23 +02:00
d318039ad5 analyse: thread user.lang into the system prompt
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 17:01:00 +02:00
b98d8d003c ui: aggregated read on top, hide stale rows, wire /log tone toggle; prompts v8
- dashboard grid: explicit "header" area as the first row so the
    aggregated read panel renders at the top instead of being
    auto-placed after the named areas.
  - indicators: hide rows flagged stale (older than the group's
    freshness threshold). Server still computes stale_symbols;
    rendering can be re-enabled by removing the
    `{% if not is_stale %}` wrapper in indicators.html.
  - /log: add tone-changed to #log-content's hx-trigger and include
    it in cassandraSetTone's selector list — toggling Novice /
    Intermediate on the Log page was previously a no-op.
  - prompts: bump PROMPT_VERSION 7→8. Strengthen the rational-vs-
    irrational framing in the strategic-log system prompt from
    aspirational to mandatory ("a paragraph without both lenses must
    be rewritten"). Require the same lens in the per-group summary,
    cross-asset aggregate, and portfolio commentary overrides.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 19:36:04 +02:00
6e7f57c6b2 phase G: data minimisation + passwordless auth + DeepSeek-first LLM
Server no longer holds portfolios. Holdings live in the browser
(localStorage); the server publishes an anonymous ticker_universe and a
gzipped /api/universe payload identical for every authenticated user, so
access patterns can't betray which tickers a user holds. AI commentary
is generated ephemerally from the browser-supplied pie and the cost
ledger row records no positions. Migrations 0009-0011 added the
universe table and dropped positions / portfolio_snapshots /
portfolios.

Authentication is now e-mail OTP only. Migration 0010 dropped
password_hash and email_verified (every active session is by
construction proof of email control). The /signup endpoint is gone;
signup and login share a single email-entry page. Email rendering is
HTML+plain-text multipart with a shared brand palette (app/branding.py)
asserted in sync with the CSS by a drift-detection test.

LLM provider defaults to DeepSeek-direct (cheaper, api.deepseek.com)
with OpenRouter as automatic fallback if DeepSeek fails. ai_log_job and
indicator_summary_job now iterate the two tones (NOVICE, INTERMEDIATE)
per cycle so the dashboard's tone toggle is instant; PROMPT_VERSION
bumped to 6 with an educational anti-TA / anti-gambling stance baked
into _CORE. NOVICE mode renders a curated glossary inline (CBOE VIX,
yield curve, HY OAS, etc.) with JS-positioned tooltips that survive
viewport edges and sticky bars. Model name and tokens hidden from the
user UI; still recorded in StrategicLog.model and AICall for admin.

Layout adds a sticky top nav, a sticky bottom markets bar (one chip per
exchange with status LED + headline index + 1d change), and
Phase H feedback reporting is queued in tasks/todo.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:16:57 +01:00