Server no longer holds portfolios. Holdings live in the browser (localStorage); the server publishes an anonymous ticker_universe and a gzipped /api/universe payload identical for every authenticated user, so access patterns can't betray which tickers a user holds. AI commentary is generated ephemerally from the browser-supplied pie and the cost ledger row records no positions. Migrations 0009-0011 added the universe table and dropped positions / portfolio_snapshots / portfolios. Authentication is now e-mail OTP only. Migration 0010 dropped password_hash and email_verified (every active session is by construction proof of email control). The /signup endpoint is gone; signup and login share a single email-entry page. Email rendering is HTML+plain-text multipart with a shared brand palette (app/branding.py) asserted in sync with the CSS by a drift-detection test. LLM provider defaults to DeepSeek-direct (cheaper, api.deepseek.com) with OpenRouter as automatic fallback if DeepSeek fails. ai_log_job and indicator_summary_job now iterate the two tones (NOVICE, INTERMEDIATE) per cycle so the dashboard's tone toggle is instant; PROMPT_VERSION bumped to 6 with an educational anti-TA / anti-gambling stance baked into _CORE. NOVICE mode renders a curated glossary inline (CBOE VIX, yield curve, HY OAS, etc.) with JS-positioned tooltips that survive viewport edges and sticky bars. Model name and tokens hidden from the user UI; still recorded in StrategicLog.model and AICall for admin. Layout adds a sticky top nav, a sticky bottom markets bar (one chip per exchange with status LED + headline index + 1d change), and Phase H feedback reporting is queued in tasks/todo.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Phase G — Data-minimisation refactor
Date opened: 2026-05-16 Status: Planning. No code yet — awaiting sign-off on this doc.
Goal
Drop "server holds your portfolio" from the threat model. After this phase, Cassandra at rest knows: email, password hash, billing state, AI cost ledger, a non-attributed set of tickers, and current market prices for those tickers. It does not know which user holds what, at what cost, at what quantity.
Holdings live in the browser (localStorage). The server acts as a price proxy that returns the entire ticker universe to every authenticated client, so the request itself can't betray the user's pie. AI commentary is the only path where holdings transit the server, and it does so in-memory for the duration of one LLM call, never persisted.
The shape
┌──────────────────────────────────────────────────────────┐
│ Browser (localStorage) │
│ • parsed pie: positions, qty, avg_cost │
│ • derived: P/L, sector tilt, sparkline cache │
└──────────────────────────────────────────────────────────┘
│ GET /api/universe (full payload, gzipped)
│ POST /api/portfolio/parse (CSV → parsed pie)
│ POST /api/analyze (pie + prices → AI text)
▼
┌──────────────────────────────────────────────────────────┐
│ Server │
│ • users(email, hash, tier) │
│ • ticker_universe(ticker, currency, last_referenced_at) │
│ • quotes (already exists — keyed by ticker) │
│ • strategic_logs / indicator_summaries (shared, macro) │
│ • ai_calls (cost ledger, no holdings) │
│ ✗ NO positions table │
│ ✗ NO portfolio_snapshots table │
│ ✗ NO per-user holdings, ever │
└──────────────────────────────────────────────────────────┘
Privacy properties this buys
- Holdings are not at rest. Server never writes a row that says "user X holds ticker Y". A full DB dump reveals only the union of all users' tickers, with no attribution.
- Price-refresh requests are unlinkable. Every authenticated user gets the same payload (entire universe), so access logs / breach evidence can't tell holdings from request bodies.
- AI analysis is ephemeral. Holdings transit memory only during one LLM call (~5-30s). No DB persistence, no logs of pie content.
Privacy properties this does NOT buy
- Server briefly sees the pie during
/api/portfolio/parse(CSV upload) and/api/analyze. This is "minutes-of-retention, in-memory" not "zero-knowledge". GDPR-honest framing: "shortest possible processing window, no retention." - Universe-add timing leak. If only one user is active when a new ticker enters the universe, that ticker is linkable to that user via timestamps. Mitigation in plan below.
- Email is still PII. Paddle billing requires it; nothing to do about that. Document clearly in privacy policy.
Data model changes
New tables
class TickerUniverse(Base):
"""The set of public tickers Cassandra tracks. Populated as the union
of all user holdings, *without user attribution*."""
__tablename__ = "ticker_universe"
yahoo_ticker: Mapped[str] = mapped_column(String(32), primary_key=True)
currency: Mapped[str | None] = mapped_column(String(8))
first_seen_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
# Refreshed by any user heartbeat that contains this ticker.
# When utcnow() - last_referenced_at > UNIVERSE_EVICTION_TTL, prune.
last_referenced_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
Removed tables (migration 0009)
positionsportfolio_snapshotsportfolios
(The Portfolio model concept goes away. A user "having a portfolio" is now
purely a browser-localStorage concept.)
Kept as-is
users,email_otps— authquotes,quotes_daily— price dataheadlines,feeds— newsstrategic_logs,indicator_summaries,ai_calls— macro AI (shared)instrument_map— T212 ↔ Yahoo resolution (admin-managed, read-only to user paths)
New API surface
GET /api/universe
Auth: session/bearer required.
Returns the full universe with current prices, gzipped JSON:
{
"as_of": "2026-05-16T14:00:00Z",
"tickers": {
"AAPL": {"p": 234.56, "c": "USD", "d": {"1d": 0.5, "1m": 3.2, "1y": 18.4}},
"VWRL.L": {...},
...
}
}
Cache-Control: max-age=60. Browser refreshes once a minute.
GET /api/universe/sparkline/{ticker}
Auth required. Lazy-loaded on hover. Same shape as today.
POST /api/portfolio/parse
Auth required. multipart/form-data: file=<csv>.
Server: parses, resolves T212→Yahoo via instrument_map, adds resolved
tickers to ticker_universe (no user FK), returns parsed pie to browser.
Discards parsed pie before responding.
Response:
{
"positions": [
{"yahoo_ticker": "AAPL", "name": "Apple Inc",
"qty": 5, "avg_cost_gbp": 178.40, "currency": "USD"},
...
],
"base_currency": "GBP",
"warnings": ["3 unmapped tickers: ..."]
}
POST /api/analyze
Auth required. Body: {"positions": [...], "prices": {...}, "anchor": "..."}.
Server constructs prompt, calls LLM, returns commentary text.
No DB writes mentioning positions. ai_calls row written (no pie content).
Optional: cache commentary text keyed by sha256(positions canonical JSON)
so re-clicking is free. The hash is not reversible to holdings.
Response: {"content": "...", "model": "...", "generated_at": "..."}
POST /api/universe/heartbeat (optional, see "Open questions" below)
Browser periodically POSTs its localStorage ticker set so the server
can refresh last_referenced_at for those tickers. The "active client
bumps timestamps" pattern keeps the universe trimmed to actually-held
tickers.
Endpoints removed
POST /api/portfolios/upload(Phase B) — replaced by/api/portfolio/parseGET /api/portfolio/{name}/summary— gone; browser computes from localStorage + universe prices
Mitigation: universe-add timing leak
The naive "INSERT IGNORE on CSV parse" lets a passive observer link a
universe-row's first_seen_at to a specific user's upload time. Two
mitigations, layered:
- Batch additions. New tickers don't enter
ticker_universedirectly from the request handler. They're queued (in Redis or in an in-process buffer) and flushed at fixed 5-minute boundaries. Multiple users' uploads batch together; ordering within a flush is randomised. - Padding. On every flush, also re-touch
last_referenced_aton N random existing universe rows. This makes "row updated at flush time T" not specifically informative about new tickers.
At low user counts (alpha), the leak is mathematically unavoidable; document this in the alpha tester agreement and skip both mitigations until we have ≥10 concurrent users.
Migration sequence
- 0009_drop_portfolio_tables.py — drop
positions,portfolio_snapshots,portfolios. Upgrade extracts distinct tickers frompositionsfirst to seedticker_universe. Downgrade is one-way (irreversible drop) — document this. - 0010_ticker_universe.py — create
ticker_universetable. Could be merged into 0009; keep separate for clarity.
Implementation order
Strategy: build the new path alongside the existing one. The destructive
DROP TABLE step lands LAST, after end-to-end verification of the new
architecture. Old endpoints are removed only after the browser is updated.
Additive (non-destructive):
- 1. Add
redis:7-alpineservice to docker-compose.yml. New env varREDIS_URLin Settings. Smoke-test connectivity fromapp. - 2. Migration
0009_ticker_universe.py— creates the new table only, leaves existing portfolio tables untouched. - 3.
app/services/ticker_universe.py— add/refresh/evict logic. Batch-flush via Redis with a 5-min boundary; padding-on-flush at first stays off (toggle for when we reach ≥10 users). - 3a. Auth flip: passwordless. Drop password_hash + email_verified (migration 0010). Collapse signup into login. Every auth is OTP. Threat model after Phase G makes passwords pure liability — see memory:cassandra_data_minimisation.
- 4.
app/services/portfolio_analysis.py— ephemeral LLM prompt + call. Pie passed in via request body, held in a function-local variable, never written to DB or logs. Includes input sanitisation (prompt-injection defence, NaN/inf rejection, 200-position cap). - 5. New router
app/routers/universe.pywith: -GET /api/universe-GET /api/universe/sparkline/{ticker}-POST /api/portfolio/parse-POST /api/analyzeAddedGZipMiddleware(≥500-byte threshold). Confirmed 70% compression on a 30-ticker universe payload. Old endpoints inapp/routers/api.pystay live for now. - 6.
app/templates/partials/portfolio.html(panel shell) +static/js/portfolio.js(localStorage pie + universe fetch + P/L compute + analyze button).upload.htmlrewired to new/api/portfolio/parseendpoint. CSS additions: pf-pill, pf-actions, pf-analysis, pf-warn. - 6a. Scheduler additions for Phase G:
-
universe_flush_jobevery 5 min (flushes Redis buffer → DB) -universe_evict_jobdaily at 00:15 UTC (60-day TTL prune) -market_jobextended to fetchconfig TOML ∪ ticker_universe - 7. Tests: universe add/evict (in service), parse-shape sanitisation (21 tests), unlinkability contract (structural assertion that the universe handler signature can't take a user-identifying parameter without failing CI).
- 8. End-to-end check (USER): re-upload existing T212 CSV via
new path, confirm pie renders correctly from localStorage with
live prices, AI commentary works, no rows land in
positions/portfolio_snapshots.
Destructive (only after step 8 passes):
- 9. Migration
0011_drop_portfolio_tables.py— droppedpositions(299 rows),portfolio_snapshots(23 rows),portfolios(2 rows). Downgrade is one-way (structural only). - 10. Removed old endpoints
POST /api/portfolios/upload,GET /api/portfolios. Removedportfolio_job.pyfrom scheduler.market_jobalready fetches "config TOML ∪ ticker_universe" (step 6a).news_jobrewired to useticker_universe ∪ instrument_mapfor per-ticker news. - 11. Deleted
Portfolio/PortfolioSnapshot/Positionmodels fromapp/models.py. RemovedPortfolioSummary/PositionOutfromapp/schemas.py. Removedpersist_pie+PersistResultfromcsv_import.py(parser remains).
Polish:
- 12.
/privacypage stating exactly what's held server-side and TTLs. - 13. Update README + plan file's review section.
Out of scope (deferred)
- E2E encrypted sync of localStorage across devices. Real demand from paying users would justify this. Mechanism: user-derived key from password (PBKDF2/Argon2 → KEK), encrypted pie blob stored on server, server can't decrypt. Phase H-ish.
- True PIR for prices. Cryptographic overkill for retail SaaS.
- Anonymous billing. Paddle requires an email. Accepted.
Locked decisions (2026-05-16)
- Redis: new compose service. Stores (a) the ephemeral pie during
/api/analyzewith a 5-min TTL, (b) the batch-buffer of new tickers awaiting universe flush. Slots in later for rate limits and Paddle webhook idempotency (Phase D). - Sparklines lazy — never bundled in
/api/universe. Browser fetches/api/universe/sparkline/{ticker}on hover. - Passive aging — no heartbeat endpoint.
last_referenced_atis bumped whenever a ticker appears in/api/portfolio/parseor/api/analyze. Eviction cron prunes rows withlast_referenced_at < now - 60 days. Effect: a user who re-uploads their CSV monthly keeps their tickers alive in the universe; long-departed users' tickers age out naturally. - No data migration of existing pies —
positionsrows are dropped without backfillingticker_universe. Users re-upload their CSV once after deploy; it lands in browser localStorage.
Review section (to be filled after implementation)
TBD after sign-off + implementation.