read.markets/app/jobs/news_job.py
Giorgio Gilestro 6e7f57c6b2 phase G: data minimisation + passwordless auth + DeepSeek-first LLM
Server no longer holds portfolios. Holdings live in the browser
(localStorage); the server publishes an anonymous ticker_universe and a
gzipped /api/universe payload identical for every authenticated user, so
access patterns can't betray which tickers a user holds. AI commentary
is generated ephemerally from the browser-supplied pie and the cost
ledger row records no positions. Migrations 0009-0011 added the
universe table and dropped positions / portfolio_snapshots /
portfolios.

Authentication is now e-mail OTP only. Migration 0010 dropped
password_hash and email_verified (every active session is by
construction proof of email control). The /signup endpoint is gone;
signup and login share a single email-entry page. Email rendering is
HTML+plain-text multipart with a shared brand palette (app/branding.py)
asserted in sync with the CSS by a drift-detection test.

LLM provider defaults to DeepSeek-direct (cheaper, api.deepseek.com)
with OpenRouter as automatic fallback if DeepSeek fails. ai_log_job and
indicator_summary_job now iterate the two tones (NOVICE, INTERMEDIATE)
per cycle so the dashboard's tone toggle is instant; PROMPT_VERSION
bumped to 6 with an educational anti-TA / anti-gambling stance baked
into _CORE. NOVICE mode renders a curated glossary inline (CBOE VIX,
yield curve, HY OAS, etc.) with JS-positioned tooltips that survive
viewport edges and sticky bars. Model name and tokens hidden from the
user UI; still recorded in StrategicLog.model and AICall for admin.

Layout adds a sticky top nav, a sticky bottom markets bar (one chip per
exchange with status LED + headline index + 1d change), and
Phase H feedback reporting is queued in tasks/todo.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:16:57 +01:00

99 lines
3.5 KiB
Python

"""Hourly news ingestion. Reads enabled feeds from the DB (not TOML — DB has
the authoritative enabled/failure state). Per-ticker Yahoo news pulled for
each symbol in the default portfolio group ('pie')."""
from __future__ import annotations
import asyncio
import httpx
from sqlalchemy import desc, select
from sqlalchemy.dialects.mysql import insert as mysql_insert
from app.db import utcnow
from app.jobs._helpers import job_lifecycle, log
from app.models import Feed, Headline, InstrumentMap, TickerUniverse
from app.services.news import dedupe, fetch_feed, fetch_yahoo_news
AUTO_DISABLE_AT = 5
async def _process_feed(client: httpx.AsyncClient, feed: Feed) -> tuple[Feed, list]:
try:
items = await fetch_feed(client, feed.name, feed.category, feed.url)
feed.consecutive_failures = 0
feed.last_success_at = utcnow()
return feed, items
except Exception as e:
feed.consecutive_failures += 1
if feed.consecutive_failures >= AUTO_DISABLE_AT:
feed.enabled = False
log.warning("feed.fetch_failed", name=feed.name,
fails=feed.consecutive_failures, error=str(e))
return feed, []
async def run() -> None:
async with job_lifecycle("news_job") as (session, run):
if run.status == "skipped":
return
feeds = (
await session.execute(select(Feed).where(Feed.enabled == True))
).scalars().all()
# Per-ticker news: pull every Yahoo ticker in the anonymous
# universe (Phase G), pair each with its display name from
# instrument_map when available. No per-user attribution.
uni_tickers = (await session.execute(
select(TickerUniverse.yahoo_ticker)
)).scalars().all()
ticker_pairs: list[tuple[str, str]] = []
if uni_tickers:
name_rows = (await session.execute(
select(InstrumentMap.yahoo_ticker, InstrumentMap.name)
.where(InstrumentMap.yahoo_ticker.in_(uni_tickers))
)).all()
names = {y: n for y, n in name_rows if y}
ticker_pairs = [(t, names.get(t) or t) for t in uni_tickers]
async with httpx.AsyncClient(follow_redirects=True) as client:
feed_results = await asyncio.gather(
*(_process_feed(client, f) for f in feeds)
)
ticker_results = await asyncio.gather(
*(fetch_yahoo_news(client, t, query_override=n)
for t, n in ticker_pairs)
)
all_headlines = []
for _feed, items in feed_results:
all_headlines.extend(items)
for items in ticker_results:
all_headlines.extend(items)
headlines = dedupe(all_headlines)
# Bulk INSERT IGNORE (fingerprint UNIQUE de-dupes across runs).
if headlines:
stmt = mysql_insert(Headline).values([
dict(
source=h.source,
category=h.category,
title=h.title[:512],
url=h.url[:1024],
published_at=h.when,
fetched_at=utcnow(),
fingerprint=h.fingerprint,
)
for h in headlines
]).prefix_with("IGNORE")
await session.execute(stmt)
await session.commit()
run.items_written = len(headlines)
log.info("news_job.done", fetched=len(all_headlines), kept=len(headlines))
if __name__ == "__main__":
asyncio.run(run())