Server no longer holds portfolios. Holdings live in the browser (localStorage); the server publishes an anonymous ticker_universe and a gzipped /api/universe payload identical for every authenticated user, so access patterns can't betray which tickers a user holds. AI commentary is generated ephemerally from the browser-supplied pie and the cost ledger row records no positions. Migrations 0009-0011 added the universe table and dropped positions / portfolio_snapshots / portfolios. Authentication is now e-mail OTP only. Migration 0010 dropped password_hash and email_verified (every active session is by construction proof of email control). The /signup endpoint is gone; signup and login share a single email-entry page. Email rendering is HTML+plain-text multipart with a shared brand palette (app/branding.py) asserted in sync with the CSS by a drift-detection test. LLM provider defaults to DeepSeek-direct (cheaper, api.deepseek.com) with OpenRouter as automatic fallback if DeepSeek fails. ai_log_job and indicator_summary_job now iterate the two tones (NOVICE, INTERMEDIATE) per cycle so the dashboard's tone toggle is instant; PROMPT_VERSION bumped to 6 with an educational anti-TA / anti-gambling stance baked into _CORE. NOVICE mode renders a curated glossary inline (CBOE VIX, yield curve, HY OAS, etc.) with JS-positioned tooltips that survive viewport edges and sticky bars. Model name and tokens hidden from the user UI; still recorded in StrategicLog.model and AICall for admin. Layout adds a sticky top nav, a sticky bottom markets bar (one chip per exchange with status LED + headline index + 1d change), and Phase H feedback reporting is queued in tasks/todo.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
281 lines
13 KiB
Markdown
281 lines
13 KiB
Markdown
# Phase G — Data-minimisation refactor
|
||
|
||
**Date opened:** 2026-05-16
|
||
**Status:** Planning. No code yet — awaiting sign-off on this doc.
|
||
|
||
## Goal
|
||
|
||
Drop "server holds your portfolio" from the threat model. After this phase,
|
||
Cassandra at rest knows: email, password hash, billing state, AI cost ledger,
|
||
a non-attributed set of tickers, and current market prices for those tickers.
|
||
It does **not** know which user holds what, at what cost, at what quantity.
|
||
|
||
Holdings live in the browser (localStorage). The server acts as a price proxy
|
||
that returns the **entire ticker universe** to every authenticated client, so
|
||
the request itself can't betray the user's pie. AI commentary is the only path
|
||
where holdings transit the server, and it does so **in-memory for the
|
||
duration of one LLM call**, never persisted.
|
||
|
||
## The shape
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────┐
|
||
│ Browser (localStorage) │
|
||
│ • parsed pie: positions, qty, avg_cost │
|
||
│ • derived: P/L, sector tilt, sparkline cache │
|
||
└──────────────────────────────────────────────────────────┘
|
||
│ GET /api/universe (full payload, gzipped)
|
||
│ POST /api/portfolio/parse (CSV → parsed pie)
|
||
│ POST /api/analyze (pie + prices → AI text)
|
||
▼
|
||
┌──────────────────────────────────────────────────────────┐
|
||
│ Server │
|
||
│ • users(email, hash, tier) │
|
||
│ • ticker_universe(ticker, currency, last_referenced_at) │
|
||
│ • quotes (already exists — keyed by ticker) │
|
||
│ • strategic_logs / indicator_summaries (shared, macro) │
|
||
│ • ai_calls (cost ledger, no holdings) │
|
||
│ ✗ NO positions table │
|
||
│ ✗ NO portfolio_snapshots table │
|
||
│ ✗ NO per-user holdings, ever │
|
||
└──────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Privacy properties this buys
|
||
|
||
1. **Holdings are not at rest**. Server never writes a row that says "user X
|
||
holds ticker Y". A full DB dump reveals only the *union* of all users'
|
||
tickers, with no attribution.
|
||
2. **Price-refresh requests are unlinkable**. Every authenticated user gets
|
||
the same payload (entire universe), so access logs / breach evidence can't
|
||
tell holdings from request bodies.
|
||
3. **AI analysis is ephemeral**. Holdings transit memory only during one LLM
|
||
call (~5-30s). No DB persistence, no logs of pie content.
|
||
|
||
## Privacy properties this does NOT buy
|
||
|
||
1. **Server briefly sees the pie** during `/api/portfolio/parse` (CSV upload)
|
||
and `/api/analyze`. This is "minutes-of-retention, in-memory" not
|
||
"zero-knowledge". GDPR-honest framing: *"shortest possible processing
|
||
window, no retention."*
|
||
2. **Universe-add timing leak**. If only one user is active when a new
|
||
ticker enters the universe, that ticker is linkable to that user via
|
||
timestamps. Mitigation in plan below.
|
||
3. **Email is still PII**. Paddle billing requires it; nothing to do about
|
||
that. Document clearly in privacy policy.
|
||
|
||
## Data model changes
|
||
|
||
### New tables
|
||
|
||
```python
|
||
class TickerUniverse(Base):
|
||
"""The set of public tickers Cassandra tracks. Populated as the union
|
||
of all user holdings, *without user attribution*."""
|
||
__tablename__ = "ticker_universe"
|
||
yahoo_ticker: Mapped[str] = mapped_column(String(32), primary_key=True)
|
||
currency: Mapped[str | None] = mapped_column(String(8))
|
||
first_seen_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
|
||
# Refreshed by any user heartbeat that contains this ticker.
|
||
# When utcnow() - last_referenced_at > UNIVERSE_EVICTION_TTL, prune.
|
||
last_referenced_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
|
||
```
|
||
|
||
### Removed tables (migration 0009)
|
||
|
||
- `positions`
|
||
- `portfolio_snapshots`
|
||
- `portfolios`
|
||
|
||
(The `Portfolio` model concept goes away. A user "having a portfolio" is now
|
||
purely a browser-localStorage concept.)
|
||
|
||
### Kept as-is
|
||
|
||
- `users`, `email_otps` — auth
|
||
- `quotes`, `quotes_daily` — price data
|
||
- `headlines`, `feeds` — news
|
||
- `strategic_logs`, `indicator_summaries`, `ai_calls` — macro AI (shared)
|
||
- `instrument_map` — T212 ↔ Yahoo resolution (admin-managed, read-only to user paths)
|
||
|
||
## New API surface
|
||
|
||
```
|
||
GET /api/universe
|
||
Auth: session/bearer required.
|
||
Returns the full universe with current prices, gzipped JSON:
|
||
{
|
||
"as_of": "2026-05-16T14:00:00Z",
|
||
"tickers": {
|
||
"AAPL": {"p": 234.56, "c": "USD", "d": {"1d": 0.5, "1m": 3.2, "1y": 18.4}},
|
||
"VWRL.L": {...},
|
||
...
|
||
}
|
||
}
|
||
Cache-Control: max-age=60. Browser refreshes once a minute.
|
||
|
||
GET /api/universe/sparkline/{ticker}
|
||
Auth required. Lazy-loaded on hover. Same shape as today.
|
||
|
||
POST /api/portfolio/parse
|
||
Auth required. multipart/form-data: file=<csv>.
|
||
Server: parses, resolves T212→Yahoo via instrument_map, adds resolved
|
||
tickers to ticker_universe (no user FK), returns parsed pie to browser.
|
||
Discards parsed pie before responding.
|
||
Response:
|
||
{
|
||
"positions": [
|
||
{"yahoo_ticker": "AAPL", "name": "Apple Inc",
|
||
"qty": 5, "avg_cost_gbp": 178.40, "currency": "USD"},
|
||
...
|
||
],
|
||
"base_currency": "GBP",
|
||
"warnings": ["3 unmapped tickers: ..."]
|
||
}
|
||
|
||
POST /api/analyze
|
||
Auth required. Body: {"positions": [...], "prices": {...}, "anchor": "..."}.
|
||
Server constructs prompt, calls LLM, returns commentary text.
|
||
No DB writes mentioning positions. ai_calls row written (no pie content).
|
||
Optional: cache commentary text keyed by sha256(positions canonical JSON)
|
||
so re-clicking is free. The hash is not reversible to holdings.
|
||
Response: {"content": "...", "model": "...", "generated_at": "..."}
|
||
|
||
POST /api/universe/heartbeat (optional, see "Open questions" below)
|
||
Browser periodically POSTs its localStorage ticker set so the server
|
||
can refresh last_referenced_at for those tickers. The "active client
|
||
bumps timestamps" pattern keeps the universe trimmed to actually-held
|
||
tickers.
|
||
```
|
||
|
||
### Endpoints removed
|
||
|
||
- `POST /api/portfolios/upload` (Phase B) — replaced by `/api/portfolio/parse`
|
||
- `GET /api/portfolio/{name}/summary` — gone; browser computes from
|
||
localStorage + universe prices
|
||
|
||
## Mitigation: universe-add timing leak
|
||
|
||
The naive "INSERT IGNORE on CSV parse" lets a passive observer link a
|
||
universe-row's `first_seen_at` to a specific user's upload time. Two
|
||
mitigations, layered:
|
||
|
||
1. **Batch additions.** New tickers don't enter `ticker_universe` directly
|
||
from the request handler. They're queued (in Redis or in an in-process
|
||
buffer) and flushed at fixed 5-minute boundaries. Multiple users' uploads
|
||
batch together; ordering within a flush is randomised.
|
||
2. **Padding.** On every flush, also re-touch `last_referenced_at` on N
|
||
random existing universe rows. This makes "row updated at flush time T"
|
||
not specifically informative about new tickers.
|
||
|
||
At low user counts (alpha), the leak is mathematically unavoidable; document
|
||
this in the alpha tester agreement and skip both mitigations until we have
|
||
≥10 concurrent users.
|
||
|
||
## Migration sequence
|
||
|
||
- [ ] **0009_drop_portfolio_tables.py** — drop `positions`,
|
||
`portfolio_snapshots`, `portfolios`. Upgrade extracts distinct tickers
|
||
from `positions` first to seed `ticker_universe`. Downgrade is
|
||
one-way (irreversible drop) — document this.
|
||
- [ ] **0010_ticker_universe.py** — create `ticker_universe` table.
|
||
Could be merged into 0009; keep separate for clarity.
|
||
|
||
## Implementation order
|
||
|
||
Strategy: build the new path alongside the existing one. The destructive
|
||
`DROP TABLE` step lands LAST, after end-to-end verification of the new
|
||
architecture. Old endpoints are removed only after the browser is updated.
|
||
|
||
**Additive (non-destructive):**
|
||
|
||
- [x] 1. Add `redis:7-alpine` service to docker-compose.yml. New env var
|
||
`REDIS_URL` in Settings. Smoke-test connectivity from `app`.
|
||
- [x] 2. Migration `0009_ticker_universe.py` — creates the new table only,
|
||
leaves existing portfolio tables untouched.
|
||
- [x] 3. `app/services/ticker_universe.py` — add/refresh/evict logic.
|
||
Batch-flush via Redis with a 5-min boundary; padding-on-flush at
|
||
first stays off (toggle for when we reach ≥10 users).
|
||
- [x] 3a. **Auth flip: passwordless.** Drop password_hash + email_verified
|
||
(migration 0010). Collapse signup into login. Every auth is OTP.
|
||
Threat model after Phase G makes passwords pure liability — see
|
||
memory:cassandra_data_minimisation.
|
||
- [x] 4. `app/services/portfolio_analysis.py` — ephemeral LLM prompt +
|
||
call. Pie passed in via request body, held in a function-local
|
||
variable, never written to DB or logs. Includes input sanitisation
|
||
(prompt-injection defence, NaN/inf rejection, 200-position cap).
|
||
- [x] 5. New router `app/routers/universe.py` with:
|
||
- `GET /api/universe`
|
||
- `GET /api/universe/sparkline/{ticker}`
|
||
- `POST /api/portfolio/parse`
|
||
- `POST /api/analyze`
|
||
Added `GZipMiddleware` (≥500-byte threshold). Confirmed 70%
|
||
compression on a 30-ticker universe payload. Old endpoints in
|
||
`app/routers/api.py` stay live for now.
|
||
- [x] 6. `app/templates/partials/portfolio.html` (panel shell) +
|
||
`static/js/portfolio.js` (localStorage pie + universe fetch +
|
||
P/L compute + analyze button). `upload.html` rewired to new
|
||
`/api/portfolio/parse` endpoint. CSS additions: pf-pill,
|
||
pf-actions, pf-analysis, pf-warn.
|
||
- [x] 6a. Scheduler additions for Phase G:
|
||
- `universe_flush_job` every 5 min (flushes Redis buffer → DB)
|
||
- `universe_evict_job` daily at 00:15 UTC (60-day TTL prune)
|
||
- `market_job` extended to fetch `config TOML ∪ ticker_universe`
|
||
- [x] 7. Tests: universe add/evict (in service), parse-shape sanitisation
|
||
(21 tests), unlinkability contract (structural assertion that
|
||
the universe handler signature can't take a user-identifying
|
||
parameter without failing CI).
|
||
- [ ] 8. **End-to-end check (USER):** re-upload existing T212 CSV via
|
||
new path, confirm pie renders correctly from localStorage with
|
||
live prices, AI commentary works, no rows land in `positions` /
|
||
`portfolio_snapshots`.
|
||
|
||
**Destructive (only after step 8 passes):**
|
||
|
||
- [x] 9. Migration `0011_drop_portfolio_tables.py` — dropped
|
||
`positions` (299 rows), `portfolio_snapshots` (23 rows),
|
||
`portfolios` (2 rows). Downgrade is one-way (structural only).
|
||
- [x] 10. Removed old endpoints `POST /api/portfolios/upload`,
|
||
`GET /api/portfolios`. Removed `portfolio_job.py` from
|
||
scheduler. `market_job` already fetches "config TOML ∪
|
||
ticker_universe" (step 6a). `news_job` rewired to use
|
||
`ticker_universe ∪ instrument_map` for per-ticker news.
|
||
- [x] 11. Deleted `Portfolio` / `PortfolioSnapshot` / `Position` models
|
||
from `app/models.py`. Removed `PortfolioSummary` / `PositionOut`
|
||
from `app/schemas.py`. Removed `persist_pie` + `PersistResult`
|
||
from `csv_import.py` (parser remains).
|
||
|
||
**Polish:**
|
||
|
||
- [ ] 12. `/privacy` page stating exactly what's held server-side and TTLs.
|
||
- [ ] 13. Update README + plan file's review section.
|
||
|
||
## Out of scope (deferred)
|
||
|
||
- **E2E encrypted sync of localStorage across devices.** Real demand from
|
||
paying users would justify this. Mechanism: user-derived key from
|
||
password (PBKDF2/Argon2 → KEK), encrypted pie blob stored on server,
|
||
server can't decrypt. Phase H-ish.
|
||
- **True PIR for prices.** Cryptographic overkill for retail SaaS.
|
||
- **Anonymous billing.** Paddle requires an email. Accepted.
|
||
|
||
## Locked decisions (2026-05-16)
|
||
|
||
1. **Redis**: new compose service. Stores (a) the ephemeral pie during
|
||
`/api/analyze` with a 5-min TTL, (b) the batch-buffer of new tickers
|
||
awaiting universe flush. Slots in later for rate limits and Paddle
|
||
webhook idempotency (Phase D).
|
||
2. **Sparklines lazy** — never bundled in `/api/universe`. Browser fetches
|
||
`/api/universe/sparkline/{ticker}` on hover.
|
||
3. **Passive aging** — no heartbeat endpoint. `last_referenced_at` is bumped
|
||
whenever a ticker appears in `/api/portfolio/parse` or `/api/analyze`.
|
||
Eviction cron prunes rows with `last_referenced_at < now - 60 days`.
|
||
Effect: a user who re-uploads their CSV monthly keeps their tickers
|
||
alive in the universe; long-departed users' tickers age out naturally.
|
||
4. **No data migration of existing pies** — `positions` rows are dropped
|
||
without backfilling `ticker_universe`. Users re-upload their CSV once
|
||
after deploy; it lands in browser localStorage.
|
||
|
||
## Review section (to be filled after implementation)
|
||
|
||
_TBD after sign-off + implementation._
|