phase G: data minimisation + passwordless auth + DeepSeek-first LLM
Server no longer holds portfolios. Holdings live in the browser (localStorage); the server publishes an anonymous ticker_universe and a gzipped /api/universe payload identical for every authenticated user, so access patterns can't betray which tickers a user holds. AI commentary is generated ephemerally from the browser-supplied pie and the cost ledger row records no positions. Migrations 0009-0011 added the universe table and dropped positions / portfolio_snapshots / portfolios. Authentication is now e-mail OTP only. Migration 0010 dropped password_hash and email_verified (every active session is by construction proof of email control). The /signup endpoint is gone; signup and login share a single email-entry page. Email rendering is HTML+plain-text multipart with a shared brand palette (app/branding.py) asserted in sync with the CSS by a drift-detection test. LLM provider defaults to DeepSeek-direct (cheaper, api.deepseek.com) with OpenRouter as automatic fallback if DeepSeek fails. ai_log_job and indicator_summary_job now iterate the two tones (NOVICE, INTERMEDIATE) per cycle so the dashboard's tone toggle is instant; PROMPT_VERSION bumped to 6 with an educational anti-TA / anti-gambling stance baked into _CORE. NOVICE mode renders a curated glossary inline (CBOE VIX, yield curve, HY OAS, etc.) with JS-positioned tooltips that survive viewport edges and sticky bars. Model name and tokens hidden from the user UI; still recorded in StrategicLog.model and AICall for admin. Layout adds a sticky top nav, a sticky bottom markets bar (one chip per exchange with status LED + headline index + 1d change), and Phase H feedback reporting is queued in tasks/todo.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
480fd311c5
commit
6e7f57c6b2
54 changed files with 5005 additions and 916 deletions
281
tasks/todo.md
Normal file
281
tasks/todo.md
Normal file
|
|
@ -0,0 +1,281 @@
|
|||
# Phase G — Data-minimisation refactor
|
||||
|
||||
**Date opened:** 2026-05-16
|
||||
**Status:** Planning. No code yet — awaiting sign-off on this doc.
|
||||
|
||||
## Goal
|
||||
|
||||
Drop "server holds your portfolio" from the threat model. After this phase,
|
||||
Cassandra at rest knows: email, password hash, billing state, AI cost ledger,
|
||||
a non-attributed set of tickers, and current market prices for those tickers.
|
||||
It does **not** know which user holds what, at what cost, at what quantity.
|
||||
|
||||
Holdings live in the browser (localStorage). The server acts as a price proxy
|
||||
that returns the **entire ticker universe** to every authenticated client, so
|
||||
the request itself can't betray the user's pie. AI commentary is the only path
|
||||
where holdings transit the server, and it does so **in-memory for the
|
||||
duration of one LLM call**, never persisted.
|
||||
|
||||
## The shape
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ Browser (localStorage) │
|
||||
│ • parsed pie: positions, qty, avg_cost │
|
||||
│ • derived: P/L, sector tilt, sparkline cache │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
│ GET /api/universe (full payload, gzipped)
|
||||
│ POST /api/portfolio/parse (CSV → parsed pie)
|
||||
│ POST /api/analyze (pie + prices → AI text)
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ Server │
|
||||
│ • users(email, hash, tier) │
|
||||
│ • ticker_universe(ticker, currency, last_referenced_at) │
|
||||
│ • quotes (already exists — keyed by ticker) │
|
||||
│ • strategic_logs / indicator_summaries (shared, macro) │
|
||||
│ • ai_calls (cost ledger, no holdings) │
|
||||
│ ✗ NO positions table │
|
||||
│ ✗ NO portfolio_snapshots table │
|
||||
│ ✗ NO per-user holdings, ever │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Privacy properties this buys
|
||||
|
||||
1. **Holdings are not at rest**. Server never writes a row that says "user X
|
||||
holds ticker Y". A full DB dump reveals only the *union* of all users'
|
||||
tickers, with no attribution.
|
||||
2. **Price-refresh requests are unlinkable**. Every authenticated user gets
|
||||
the same payload (entire universe), so access logs / breach evidence can't
|
||||
tell holdings from request bodies.
|
||||
3. **AI analysis is ephemeral**. Holdings transit memory only during one LLM
|
||||
call (~5-30s). No DB persistence, no logs of pie content.
|
||||
|
||||
## Privacy properties this does NOT buy
|
||||
|
||||
1. **Server briefly sees the pie** during `/api/portfolio/parse` (CSV upload)
|
||||
and `/api/analyze`. This is "minutes-of-retention, in-memory" not
|
||||
"zero-knowledge". GDPR-honest framing: *"shortest possible processing
|
||||
window, no retention."*
|
||||
2. **Universe-add timing leak**. If only one user is active when a new
|
||||
ticker enters the universe, that ticker is linkable to that user via
|
||||
timestamps. Mitigation in plan below.
|
||||
3. **Email is still PII**. Paddle billing requires it; nothing to do about
|
||||
that. Document clearly in privacy policy.
|
||||
|
||||
## Data model changes
|
||||
|
||||
### New tables
|
||||
|
||||
```python
|
||||
class TickerUniverse(Base):
|
||||
"""The set of public tickers Cassandra tracks. Populated as the union
|
||||
of all user holdings, *without user attribution*."""
|
||||
__tablename__ = "ticker_universe"
|
||||
yahoo_ticker: Mapped[str] = mapped_column(String(32), primary_key=True)
|
||||
currency: Mapped[str | None] = mapped_column(String(8))
|
||||
first_seen_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
|
||||
# Refreshed by any user heartbeat that contains this ticker.
|
||||
# When utcnow() - last_referenced_at > UNIVERSE_EVICTION_TTL, prune.
|
||||
last_referenced_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
|
||||
```
|
||||
|
||||
### Removed tables (migration 0009)
|
||||
|
||||
- `positions`
|
||||
- `portfolio_snapshots`
|
||||
- `portfolios`
|
||||
|
||||
(The `Portfolio` model concept goes away. A user "having a portfolio" is now
|
||||
purely a browser-localStorage concept.)
|
||||
|
||||
### Kept as-is
|
||||
|
||||
- `users`, `email_otps` — auth
|
||||
- `quotes`, `quotes_daily` — price data
|
||||
- `headlines`, `feeds` — news
|
||||
- `strategic_logs`, `indicator_summaries`, `ai_calls` — macro AI (shared)
|
||||
- `instrument_map` — T212 ↔ Yahoo resolution (admin-managed, read-only to user paths)
|
||||
|
||||
## New API surface
|
||||
|
||||
```
|
||||
GET /api/universe
|
||||
Auth: session/bearer required.
|
||||
Returns the full universe with current prices, gzipped JSON:
|
||||
{
|
||||
"as_of": "2026-05-16T14:00:00Z",
|
||||
"tickers": {
|
||||
"AAPL": {"p": 234.56, "c": "USD", "d": {"1d": 0.5, "1m": 3.2, "1y": 18.4}},
|
||||
"VWRL.L": {...},
|
||||
...
|
||||
}
|
||||
}
|
||||
Cache-Control: max-age=60. Browser refreshes once a minute.
|
||||
|
||||
GET /api/universe/sparkline/{ticker}
|
||||
Auth required. Lazy-loaded on hover. Same shape as today.
|
||||
|
||||
POST /api/portfolio/parse
|
||||
Auth required. multipart/form-data: file=<csv>.
|
||||
Server: parses, resolves T212→Yahoo via instrument_map, adds resolved
|
||||
tickers to ticker_universe (no user FK), returns parsed pie to browser.
|
||||
Discards parsed pie before responding.
|
||||
Response:
|
||||
{
|
||||
"positions": [
|
||||
{"yahoo_ticker": "AAPL", "name": "Apple Inc",
|
||||
"qty": 5, "avg_cost_gbp": 178.40, "currency": "USD"},
|
||||
...
|
||||
],
|
||||
"base_currency": "GBP",
|
||||
"warnings": ["3 unmapped tickers: ..."]
|
||||
}
|
||||
|
||||
POST /api/analyze
|
||||
Auth required. Body: {"positions": [...], "prices": {...}, "anchor": "..."}.
|
||||
Server constructs prompt, calls LLM, returns commentary text.
|
||||
No DB writes mentioning positions. ai_calls row written (no pie content).
|
||||
Optional: cache commentary text keyed by sha256(positions canonical JSON)
|
||||
so re-clicking is free. The hash is not reversible to holdings.
|
||||
Response: {"content": "...", "model": "...", "generated_at": "..."}
|
||||
|
||||
POST /api/universe/heartbeat (optional, see "Open questions" below)
|
||||
Browser periodically POSTs its localStorage ticker set so the server
|
||||
can refresh last_referenced_at for those tickers. The "active client
|
||||
bumps timestamps" pattern keeps the universe trimmed to actually-held
|
||||
tickers.
|
||||
```
|
||||
|
||||
### Endpoints removed
|
||||
|
||||
- `POST /api/portfolios/upload` (Phase B) — replaced by `/api/portfolio/parse`
|
||||
- `GET /api/portfolio/{name}/summary` — gone; browser computes from
|
||||
localStorage + universe prices
|
||||
|
||||
## Mitigation: universe-add timing leak
|
||||
|
||||
The naive "INSERT IGNORE on CSV parse" lets a passive observer link a
|
||||
universe-row's `first_seen_at` to a specific user's upload time. Two
|
||||
mitigations, layered:
|
||||
|
||||
1. **Batch additions.** New tickers don't enter `ticker_universe` directly
|
||||
from the request handler. They're queued (in Redis or in an in-process
|
||||
buffer) and flushed at fixed 5-minute boundaries. Multiple users' uploads
|
||||
batch together; ordering within a flush is randomised.
|
||||
2. **Padding.** On every flush, also re-touch `last_referenced_at` on N
|
||||
random existing universe rows. This makes "row updated at flush time T"
|
||||
not specifically informative about new tickers.
|
||||
|
||||
At low user counts (alpha), the leak is mathematically unavoidable; document
|
||||
this in the alpha tester agreement and skip both mitigations until we have
|
||||
≥10 concurrent users.
|
||||
|
||||
## Migration sequence
|
||||
|
||||
- [ ] **0009_drop_portfolio_tables.py** — drop `positions`,
|
||||
`portfolio_snapshots`, `portfolios`. Upgrade extracts distinct tickers
|
||||
from `positions` first to seed `ticker_universe`. Downgrade is
|
||||
one-way (irreversible drop) — document this.
|
||||
- [ ] **0010_ticker_universe.py** — create `ticker_universe` table.
|
||||
Could be merged into 0009; keep separate for clarity.
|
||||
|
||||
## Implementation order
|
||||
|
||||
Strategy: build the new path alongside the existing one. The destructive
|
||||
`DROP TABLE` step lands LAST, after end-to-end verification of the new
|
||||
architecture. Old endpoints are removed only after the browser is updated.
|
||||
|
||||
**Additive (non-destructive):**
|
||||
|
||||
- [x] 1. Add `redis:7-alpine` service to docker-compose.yml. New env var
|
||||
`REDIS_URL` in Settings. Smoke-test connectivity from `app`.
|
||||
- [x] 2. Migration `0009_ticker_universe.py` — creates the new table only,
|
||||
leaves existing portfolio tables untouched.
|
||||
- [x] 3. `app/services/ticker_universe.py` — add/refresh/evict logic.
|
||||
Batch-flush via Redis with a 5-min boundary; padding-on-flush at
|
||||
first stays off (toggle for when we reach ≥10 users).
|
||||
- [x] 3a. **Auth flip: passwordless.** Drop password_hash + email_verified
|
||||
(migration 0010). Collapse signup into login. Every auth is OTP.
|
||||
Threat model after Phase G makes passwords pure liability — see
|
||||
memory:cassandra_data_minimisation.
|
||||
- [x] 4. `app/services/portfolio_analysis.py` — ephemeral LLM prompt +
|
||||
call. Pie passed in via request body, held in a function-local
|
||||
variable, never written to DB or logs. Includes input sanitisation
|
||||
(prompt-injection defence, NaN/inf rejection, 200-position cap).
|
||||
- [x] 5. New router `app/routers/universe.py` with:
|
||||
- `GET /api/universe`
|
||||
- `GET /api/universe/sparkline/{ticker}`
|
||||
- `POST /api/portfolio/parse`
|
||||
- `POST /api/analyze`
|
||||
Added `GZipMiddleware` (≥500-byte threshold). Confirmed 70%
|
||||
compression on a 30-ticker universe payload. Old endpoints in
|
||||
`app/routers/api.py` stay live for now.
|
||||
- [x] 6. `app/templates/partials/portfolio.html` (panel shell) +
|
||||
`static/js/portfolio.js` (localStorage pie + universe fetch +
|
||||
P/L compute + analyze button). `upload.html` rewired to new
|
||||
`/api/portfolio/parse` endpoint. CSS additions: pf-pill,
|
||||
pf-actions, pf-analysis, pf-warn.
|
||||
- [x] 6a. Scheduler additions for Phase G:
|
||||
- `universe_flush_job` every 5 min (flushes Redis buffer → DB)
|
||||
- `universe_evict_job` daily at 00:15 UTC (60-day TTL prune)
|
||||
- `market_job` extended to fetch `config TOML ∪ ticker_universe`
|
||||
- [x] 7. Tests: universe add/evict (in service), parse-shape sanitisation
|
||||
(21 tests), unlinkability contract (structural assertion that
|
||||
the universe handler signature can't take a user-identifying
|
||||
parameter without failing CI).
|
||||
- [ ] 8. **End-to-end check (USER):** re-upload existing T212 CSV via
|
||||
new path, confirm pie renders correctly from localStorage with
|
||||
live prices, AI commentary works, no rows land in `positions` /
|
||||
`portfolio_snapshots`.
|
||||
|
||||
**Destructive (only after step 8 passes):**
|
||||
|
||||
- [x] 9. Migration `0011_drop_portfolio_tables.py` — dropped
|
||||
`positions` (299 rows), `portfolio_snapshots` (23 rows),
|
||||
`portfolios` (2 rows). Downgrade is one-way (structural only).
|
||||
- [x] 10. Removed old endpoints `POST /api/portfolios/upload`,
|
||||
`GET /api/portfolios`. Removed `portfolio_job.py` from
|
||||
scheduler. `market_job` already fetches "config TOML ∪
|
||||
ticker_universe" (step 6a). `news_job` rewired to use
|
||||
`ticker_universe ∪ instrument_map` for per-ticker news.
|
||||
- [x] 11. Deleted `Portfolio` / `PortfolioSnapshot` / `Position` models
|
||||
from `app/models.py`. Removed `PortfolioSummary` / `PositionOut`
|
||||
from `app/schemas.py`. Removed `persist_pie` + `PersistResult`
|
||||
from `csv_import.py` (parser remains).
|
||||
|
||||
**Polish:**
|
||||
|
||||
- [ ] 12. `/privacy` page stating exactly what's held server-side and TTLs.
|
||||
- [ ] 13. Update README + plan file's review section.
|
||||
|
||||
## Out of scope (deferred)
|
||||
|
||||
- **E2E encrypted sync of localStorage across devices.** Real demand from
|
||||
paying users would justify this. Mechanism: user-derived key from
|
||||
password (PBKDF2/Argon2 → KEK), encrypted pie blob stored on server,
|
||||
server can't decrypt. Phase H-ish.
|
||||
- **True PIR for prices.** Cryptographic overkill for retail SaaS.
|
||||
- **Anonymous billing.** Paddle requires an email. Accepted.
|
||||
|
||||
## Locked decisions (2026-05-16)
|
||||
|
||||
1. **Redis**: new compose service. Stores (a) the ephemeral pie during
|
||||
`/api/analyze` with a 5-min TTL, (b) the batch-buffer of new tickers
|
||||
awaiting universe flush. Slots in later for rate limits and Paddle
|
||||
webhook idempotency (Phase D).
|
||||
2. **Sparklines lazy** — never bundled in `/api/universe`. Browser fetches
|
||||
`/api/universe/sparkline/{ticker}` on hover.
|
||||
3. **Passive aging** — no heartbeat endpoint. `last_referenced_at` is bumped
|
||||
whenever a ticker appears in `/api/portfolio/parse` or `/api/analyze`.
|
||||
Eviction cron prunes rows with `last_referenced_at < now - 60 days`.
|
||||
Effect: a user who re-uploads their CSV monthly keeps their tickers
|
||||
alive in the universe; long-departed users' tickers age out naturally.
|
||||
4. **No data migration of existing pies** — `positions` rows are dropped
|
||||
without backfilling `ticker_universe`. Users re-upload their CSV once
|
||||
after deploy; it lands in browser localStorage.
|
||||
|
||||
## Review section (to be filled after implementation)
|
||||
|
||||
_TBD after sign-off + implementation._
|
||||
Loading…
Add table
Add a link
Reference in a new issue