read.markets

Author	SHA1	Message	Date
Giorgio Gilestro	45fa31bb2b	ai: structured-output + reviewer agent for indicator summaries Replaces the regex-based clean_summary / looks_like_leakage pipeline that produced the 2026-05-29 valuation-read leak. Two layers of defence in depth: 1. JSON-mode generation. The per-group and aggregate summary system prompts now require the model to emit a single object {"read": "..."}; response_format={"type":"json_object"} is passed through to the provider so the API enforces well-formed JSON. Prose outside the field is physically impossible. The "read" field is the only schema slot, so the model has nowhere to spill scratchpad into the envelope. 2. Reviewer agent. services/output_review.review_read() makes a second small LLM call that judges whether the candidate "read" string is publishable. It catches the residual failure mode — scratchpad INSIDE the field ("Let's see…", multi-question parentheticals, meta-commentary) — and returns a JSON verdict {"clean": bool, "reason": str}. Any failure (provider error, parse error, missing field) returns clean=false (fail-safe). Cost ~$0.0001/check; latency ~1-2 s in the hourly job, no user-facing latency. The old regex scaffolding (_LEAK_PATTERNS, clean_summary, looks_like_leakage, _TRAILING_QUOTE) is deleted entirely. It produced false positives (chopped legitimate "The indicators are…" leaders) and false negatives (never matched the chain-of-thought patterns the model actually emits). The reviewer agent is strictly better on both. On reviewer/parse rejection: don't persist a new IndicatorSummary; the API's existing fallback to the previous good row continues to serve the panel. Failures are logged as ind_summary.json_invalid / ind_summary.reviewer_rejected so we can measure the rejection rate. Reviewer cost is added to the row's recorded cost_usd so the monthly budget cap covers the full pipeline. Adds tests/test_output_review.py: 11 cases covering _extract_read (JSON envelope handling — invalid JSON, missing field, wrong types, empty values) and review_read (clean / unclean verdicts plus three fail-safe paths for malformed reviewer responses). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 13:10:52 +02:00
Giorgio Gilestro	c5fb4525f3	jobs: per-row savepoint + aggregate logging in translation fan-out Previously translate_log_for_active_languages and translate_summary_for_active_languages added every successful translation to the session and called session.commit() once at the end. A single bad row (DB error, constraint violation, encoding mismatch) rolled back the whole batch — losing all the languages that had succeeded. Wrap each row in session.begin_nested() so a per-row failure only loses that one row. Track succeeded/failed counts and log them at the end — escalating to error if zero succeeded out of N attempted, so total failure surfaces in monitoring instead of just N warning lines.	2026-05-28 12:37:06 +02:00
Giorgio Gilestro	4adc8dfe82	openrouter: split into llm_prompts (prompt engineering) + transport openrouter.py was 790 lines mixing two orthogonal concerns: - Prompt engineering (build_system_prompt, build_summary_, build_chat_, build_daily_digest_*, etc.) — ~400 lines, changes weekly as PROMPT_VERSION bumps - LLM transport (call_llm, _provider_chain, _call_provider, retry + fallback machinery) — ~250 lines, rarely changes Extracted the prompt-engineering surface to app/services/llm_prompts.py. Transport stays in openrouter.py (consistent with the filename — the OpenRouter URL is the transport's anchor). All import sites (jobs, routers, services, tests) split their multi-import lines into two: prompt-things from llm_prompts, transport from openrouter. PROMPT_VERSION constant, _TONE_ALIASES, _resolve_tone, and SYSTEM_PROMPT moved with the prompt functions. No behaviour change — pure relocation. Function signatures, body, and naming all preserved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 21:27:23 +02:00
Giorgio Gilestro	a6d686324c	models: align translation column naming + add token counts Three recently-added tables (strategic_log_translations, indicator_summary_translations, csv_format_templates) drifted from the codebase's existing naming convention: - llm_model -> model - llm_cost_usd -> cost_usd - content_md -> content (on the two translation tables; csv_format doesn't have a content field) Also added prompt_tokens and completion_tokens to the three tables; they were silently dropped at write time despite LogResult exposing them. All writer call sites (ai_log_job, indicator_summary_job, llm_csv_parser) and reader call sites (api.py localized helpers) updated to match. Tests realigned. Migration 0025 uses batch_alter_table for SQLite compatibility. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 21:18:29 +02:00
Giorgio Gilestro	664757ea8a	i18n: localize indicator summaries (per-group + aggregate read)	2026-05-27 20:19:47 +02:00
Giorgio Gilestro	b47c45e218	backend: dedupe shared logic (indicator_summary_job, CHAT_REFERENCE_LINE, call_openrouter alias) - indicator_summary_job.py imported its own copies of _month_spend and _latest_quotes_by_group; _market_context.py already exposes these. Switched to the canonical imports. Also fixed _market_context's latest_quotes_by_group to actually filter null prices (it claimed to in its docstring but lacked the WHERE clause). - api.py duplicated REFERENCE_LINE as CHAT_REFERENCE_LINE — same string, two sources of truth. Now imports REFERENCE_LINE. - Chat endpoint used the deprecated `call_openrouter` alias and passed an explicit `model=` that bypassed the provider chain. Switched to `call_llm` with default model selection, then removed the alias. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 19:30:11 +02:00
Giorgio Gilestro	6e7f57c6b2	phase G: data minimisation + passwordless auth + DeepSeek-first LLM Server no longer holds portfolios. Holdings live in the browser (localStorage); the server publishes an anonymous ticker_universe and a gzipped /api/universe payload identical for every authenticated user, so access patterns can't betray which tickers a user holds. AI commentary is generated ephemerally from the browser-supplied pie and the cost ledger row records no positions. Migrations 0009-0011 added the universe table and dropped positions / portfolio_snapshots / portfolios. Authentication is now e-mail OTP only. Migration 0010 dropped password_hash and email_verified (every active session is by construction proof of email control). The /signup endpoint is gone; signup and login share a single email-entry page. Email rendering is HTML+plain-text multipart with a shared brand palette (app/branding.py) asserted in sync with the CSS by a drift-detection test. LLM provider defaults to DeepSeek-direct (cheaper, api.deepseek.com) with OpenRouter as automatic fallback if DeepSeek fails. ai_log_job and indicator_summary_job now iterate the two tones (NOVICE, INTERMEDIATE) per cycle so the dashboard's tone toggle is instant; PROMPT_VERSION bumped to 6 with an educational anti-TA / anti-gambling stance baked into _CORE. NOVICE mode renders a curated glossary inline (CBOE VIX, yield curve, HY OAS, etc.) with JS-positioned tooltips that survive viewport edges and sticky bars. Model name and tokens hidden from the user UI; still recorded in StrategicLog.model and AICall for admin. Layout adds a sticky top nav, a sticky bottom markets bar (one chip per exchange with status LED + headline index + 1d change), and Phase H feedback reporting is queued in tasks/todo.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 14:16:57 +01:00
Giorgio Gilestro	40cfb50e37	market-aware AI cadence + incremental log updates Two changes that together cut OpenRouter spend ~50% and give the daily log temporal awareness. 1. CadencePolicy (app/services/cadence.py): expensive AI jobs only fire hourly during the EU/US active window (Mon-Fri 07-21 UTC). Off-hours weekdays throttle to every 4h; weekends to every 12h. ai_log_job and indicator_summary_job both consult the policy before doing real work; market/news/portfolio ingest jobs stay hourly (cheap, no API cost). Skipped runs land in job_runs with status 'skipped' and the throttle reason in error. 2. Update mode for ai_log_job: when an earlier log exists for the current UTC day, it's passed to the model as 'Earlier log from today (generated HH:MM UTC)'. The system prompt grows an Update mode section instructing the model to revise — not restart — and anchor on what has CHANGED since the earlier draft. The TL;DR leads with intra-day change when meaningful, the watch list evolves rather than restarts. PROMPT_VERSION bumped to 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 10:17:39 +01:00
Giorgio Gilestro	2f223b75a3	strip prompt-echo leakage in indicator summaries DeepSeek occasionally regurgitates the system prompt verbatim ("Constraints: ≤60 words...", "Example good: ..."). Three-pronged fix: 1. Removed the inline good/bad example blocks from the per-group and aggregate system prompts — DeepSeek was treating them as templates to copy. The hard constraints alone are clear enough. 2. Expanded the LEAK_PATTERNS list to catch the prompt-label echoes that still occasionally slip through ("Key observations:", "The indicators are:", "Must cite ...", "Should give ...", bare "Key:"). Cleanup now runs up to 6 passes for compound leakage. 3. Added looks_like_leakage() — if the cleaned output still contains tell-tale phrases ("≤60 words", "instructions:", etc.), the summary is skipped rather than persisted. Logs a 'leakage_detected' warning and an ai_calls row with status=leaked so we can see the failure rate over time. The previous good summary stays visible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 10:10:24 +01:00
Giorgio Gilestro	1edf9cad41	add Eurostat + UK ONS sources; valuation/bubble/economy/bonds groups; aggregate read; market-open header Three new data sources hooked into the existing SOURCES registry. All open APIs, no keys: - EUROSTAT: prefix EUROSTAT:dataset?dim=val&... — current EU bond yields (Bund/OAT/BTP/EZ) and Eurozone economic indicators that FRED's OECD-mirror series stopped updating in 2022-2023. - ONS: prefix ONS:topic/cdid/dataset — current UK CPI, unemployment, GDP, industrial production. Replaces the 5+ month-stale FRED LRHUTTTTGBM156S mirror. New indicator groups in default.toml feed the strategic/fundamental lens we converged on: valuation (CAPE/Buffett anchors), bubble_watch (SKEW/VVIX/RSP vs SPY/HYG vs TLT/IPO/crypto), economy (multi-region, ALL current-or-stale-flagged), bonds (UK/EU/US/JPN sovereign yields). Indicator panel now opens with an AI "read" interpretation per group (generated hourly at :07 UTC alongside an aggregate cross-group read shown in the dashboard header). The aggregate is grounded by a markets strip — NYSE/LSE/Frankfurt/Tokyo/HK/Shanghai with open/closed LEDs and next-open countdown, computed locally from each exchange's tz. Other UX bits: indicator-row tooltips populated from TOML notes; rows whose last observation is >90 days old get a 'stale' chip; ghost symbols (in DB but no longer in TOML) filtered out of the panel; Eurostat/ONS symbols display as short codes rather than the full API path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 23:07:42 +01:00

10 commits