Translate for any user with lang='it' regardless of paid/free status. Italian + UK are the first markets, so IT availability is part of the public-facing experience — a free-tier visitor needs to see the AI in Italian to convert. At ~$0.005/day total cost the gating isn't worth the savings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
382 lines
16 KiB
Markdown
382 lines
16 KiB
Markdown
# Localization (Italian active, ES/FR/DE WIP) — Design Spec
|
|
|
|
**Date:** 2026-05-27
|
|
**Status:** Draft — pending implementation plan
|
|
|
|
## Context
|
|
|
|
All AI-generated content (strategic log, daily email digest, portfolio
|
|
analysis, follow-up chat) is English-only today. The operator wants to
|
|
add Italian translation as the first localization, with Spanish,
|
|
French, and German listed as "coming soon" in the settings UI but not
|
|
yet functional. Italian must work end-to-end from settings dropdown to
|
|
rendered output; the other three exist as commitments and design
|
|
placeholders so adding them later is a flag flip.
|
|
|
|
This is foundational plumbing: it touches every LLM call site we ship
|
|
today and shapes how every future AI feature handles language. Doing it
|
|
first means later features (qty/cost edit narratives, P/L summaries,
|
|
alert text, etc.) inherit the i18n wiring for free instead of needing a
|
|
retrofit.
|
|
|
|
## Goals
|
|
|
|
- A user can pick `Italiano` from a settings dropdown and immediately
|
|
see every AI-generated surface in Italian.
|
|
- Adding `es`, `fr`, or `de` later is a one-line change to a constant
|
|
plus optionally validating the dropdown's enabled set.
|
|
- Translation cost stays in the "noise" range — we use the same
|
|
DeepSeek-4-flash model the rest of the system uses (~$0.28/M output
|
|
tokens). No separate "cheap translation" plumbing.
|
|
- Strategic-log reads stay instant for non-English users — no
|
|
read-time translation latency.
|
|
|
|
## Non-goals
|
|
|
|
- UI label translation. The dashboard buttons, settings labels,
|
|
headings, and other chrome remain English. Only the AI's own output
|
|
is localized.
|
|
- Translation of indicator summaries. The same pattern will apply when
|
|
those become user-facing prose, but they aren't surfaced today.
|
|
- Backfilling translations for historical strategic logs. Translation
|
|
only happens going forward, at the moment a new English log is written.
|
|
- Activation of Spanish/French/German. They appear in the dropdown as
|
|
"coming soon" with disabled options; the value-validation layer in
|
|
the settings POST refuses them.
|
|
|
|
## Two distinct translation paths
|
|
|
|
The system has two categories of AI-generated content, with different
|
|
generation patterns:
|
|
|
|
### Per-user content (analyse, digest, chat)
|
|
|
|
Each call already produces output for exactly one user. The fix is
|
|
trivial: the user's `lang` threads into the prompt assembly, and the
|
|
system prompt gains a `"Respond in Italian."` clause when `lang != 'en'`.
|
|
One LLM call, no extra cost, no extra latency.
|
|
|
|
### Shared content (strategic log)
|
|
|
|
The hourly `ai_log_job` writes a single English log row used by every
|
|
user. To serve non-English users, we generate the English log as today,
|
|
then translate it to each active non-English language via a separate
|
|
LLM call and store the result in a new `strategic_log_translations`
|
|
table. Translations are fanned out in parallel with `asyncio.gather` so
|
|
total translation time is max(single call), not sum. The `/log`
|
|
endpoint serves the translation matching the requester's `lang`,
|
|
falling back to English if none exists.
|
|
|
|
Why translate-after rather than generate-N-times: the strategic log
|
|
includes live market data, headlines, and references that are
|
|
expensive to assemble. Re-running the full generation in each language
|
|
duplicates that work; translating the rendered output preserves a
|
|
single source of truth (the English original) and only spends LLM
|
|
tokens on the actual prose conversion.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ User has user.lang preference │
|
|
│ Values: 'en' (default) | 'it' (active) | 'es'/'fr'/'de' (WIP) │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│
|
|
├─ Per-user surfaces (portfolio analyse, daily digest, chat)
|
|
│ └─ prompt assembly threads user.lang to
|
|
│ respond_in_clause() → appended to system prompt
|
|
│ when lang != 'en'. Single call_llm, no extra cost.
|
|
│
|
|
└─ Shared surfaces (strategic log)
|
|
├─ ai_log_job writes the English row as today
|
|
├─ Then SELECTs distinct users.lang where lang != 'en'
|
|
│ AND user has active paid access
|
|
├─ asyncio.gather of one translate() call per language
|
|
└─ Each result → INSERT into strategic_log_translations
|
|
keyed by (log_id, lang) UNIQUE
|
|
```
|
|
|
|
## Data model
|
|
|
|
### `users.lang` (new column)
|
|
|
|
```sql
|
|
ALTER TABLE users
|
|
ADD COLUMN lang VARCHAR(8) NOT NULL DEFAULT 'en';
|
|
```
|
|
|
|
Existing rows pick up the `en` default. Application-level validation
|
|
restricts writes to the `ACTIVE_LANGUAGES` set; the database column
|
|
accepts anything in `VARCHAR(8)` (no CHECK constraint — we want to
|
|
add new languages without a migration).
|
|
|
|
### `strategic_log_translations` (new table)
|
|
|
|
```sql
|
|
CREATE TABLE strategic_log_translations (
|
|
id BIGINT PRIMARY KEY AUTO_INCREMENT,
|
|
log_id BIGINT NOT NULL,
|
|
lang VARCHAR(8) NOT NULL,
|
|
content_md TEXT NOT NULL,
|
|
generated_at DATETIME(6) NOT NULL,
|
|
llm_model VARCHAR(64),
|
|
llm_cost_usd FLOAT,
|
|
CONSTRAINT fk_slt_log
|
|
FOREIGN KEY (log_id) REFERENCES strategic_logs(id) ON DELETE CASCADE,
|
|
CONSTRAINT uq_slt_log_lang UNIQUE (log_id, lang)
|
|
);
|
|
```
|
|
|
|
ON DELETE CASCADE means evicting an old strategic log row also drops
|
|
its translations. The UNIQUE constraint prevents duplicate translations
|
|
for the same log/lang combo.
|
|
|
|
## Components
|
|
|
|
### `app/services/i18n.py` (new)
|
|
|
|
```python
|
|
LANGUAGES = {
|
|
"en": "English",
|
|
"it": "Italian",
|
|
"es": "Spanish",
|
|
"fr": "French",
|
|
"de": "German",
|
|
}
|
|
|
|
# Set of language codes that users can actually pick from the settings
|
|
# dropdown. ES/FR/DE remain in LANGUAGES so their labels render, but
|
|
# the settings POST validator and the strategic-log translation fan-out
|
|
# both consult this set.
|
|
ACTIVE_LANGUAGES = {"en", "it"}
|
|
|
|
|
|
def respond_in_clause(lang: str) -> str:
|
|
"""Suffix appended to per-user LLM system prompts.
|
|
|
|
Returns an empty string for 'en' (the default everywhere already).
|
|
Otherwise returns "\n\nRespond in <Language>." so the model knows
|
|
to write its output in the user's language.
|
|
"""
|
|
if not lang or lang == "en":
|
|
return ""
|
|
name = LANGUAGES.get(lang, "English")
|
|
return f"\n\nRespond in {name}."
|
|
```
|
|
|
|
### `app/services/translation.py` (new)
|
|
|
|
```python
|
|
async def translate(
|
|
client: httpx.AsyncClient,
|
|
text: str,
|
|
target_lang: str,
|
|
) -> tuple[str, LogResult]:
|
|
"""Translate ``text`` (markdown) to ``target_lang``.
|
|
|
|
Uses the default ``call_llm`` provider chain — DeepSeek-4-flash via
|
|
the OG API is already cheap enough ($0.28/M output) that a separate
|
|
'translation model' setting would be over-engineering.
|
|
|
|
Returns ``(translated_markdown, LogResult)`` so the caller can
|
|
persist provenance (model + cost) alongside the translation.
|
|
Raises on provider failure; caller decides whether to surface or
|
|
swallow.
|
|
"""
|
|
```
|
|
|
|
System prompt: *"Translate the following markdown to {language}. Preserve all formatting (headings, lists, links, emphasis). Do NOT translate ticker symbols, company names, numbers, percentages, or dates. Output ONLY the translated markdown — no preamble, no commentary."*
|
|
|
|
### `app/models.py` (modified)
|
|
|
|
- `User`: add `lang: Mapped[str] = mapped_column(String(8), nullable=False, default="en", server_default="en")`
|
|
- New class `StrategicLogTranslation` matching the table above
|
|
|
|
### `app/jobs/ai_log_job.py` (modified)
|
|
|
|
After the existing English log row is persisted, add a translation
|
|
fan-out:
|
|
|
|
```python
|
|
# Select distinct active non-English languages.
|
|
async with session_factory() as session:
|
|
rows = (await session.execute(
|
|
select(User.lang).distinct()
|
|
.where(User.lang.in_(ACTIVE_LANGUAGES - {"en"}))
|
|
)).scalars().all()
|
|
active_langs = list(rows)
|
|
|
|
if active_langs:
|
|
async with httpx.AsyncClient(...) as client:
|
|
results = await asyncio.gather(*[
|
|
translate(client, log_row.content_md, lang)
|
|
for lang in active_langs
|
|
], return_exceptions=True)
|
|
for lang, result in zip(active_langs, results):
|
|
if isinstance(result, Exception):
|
|
log.warning("log.translate.failed", lang=lang, error=str(result)[:200])
|
|
continue
|
|
translated_md, llm_log = result
|
|
session.add(StrategicLogTranslation(
|
|
log_id=log_row.id, lang=lang,
|
|
content_md=translated_md,
|
|
generated_at=utcnow(),
|
|
llm_model=llm_log.model,
|
|
llm_cost_usd=llm_log.cost_usd,
|
|
))
|
|
await session.commit()
|
|
```
|
|
|
|
Errors in individual language translations are logged but do not fail
|
|
the job. Missing translations get rendered as the English fallback at
|
|
read time.
|
|
|
|
### `app/jobs/email_digest_job.py` (modified)
|
|
|
|
The digest is already per-user and assembles its own prompt. Thread
|
|
`user.lang` through:
|
|
|
|
- `_generate_variants(...)` accepts a `target_lang` param
|
|
- The system prompt assembly appends `respond_in_clause(target_lang)`
|
|
- Subject-line generation runs in the same call, so it's localized too
|
|
|
|
### `app/services/portfolio_analysis.py` (modified)
|
|
|
|
- `AnalysisRequest` gains a `lang: str = "en"` field, populated by the
|
|
route from `principal.user.lang`
|
|
- `analyse(...)` appends `respond_in_clause(req.lang)` to its system prompt
|
|
|
|
### `app/routers/universe.py` (modified — the `/api/analyze` route)
|
|
|
|
Read the current user's `lang` and put it in the payload before calling
|
|
`analyse(...)`. (The current route gets the principal via Depends.)
|
|
|
|
### `app/routers/pages.py` / the `/log` resolution (modified)
|
|
|
|
When rendering `/log` (and the `/log/{day}` historical variant), look
|
|
up the user's `lang`. If `lang != 'en'`, attempt to fetch the matching
|
|
`StrategicLogTranslation`; if present, render that. If absent, fall
|
|
back to the English `StrategicLog.content_md`. No silent error — the
|
|
fallback is the intended graceful path.
|
|
|
|
### Settings UI (`app/templates/settings.html` modified)
|
|
|
|
New section under existing user preferences (alongside the digest-tone
|
|
toggle):
|
|
|
|
```html
|
|
<details class="settings-section">
|
|
<summary class="settings-section__head">Language</summary>
|
|
<p class="settings-section__lede">
|
|
The language the AI uses for the strategic log, your daily digest,
|
|
and portfolio commentary. UI labels stay in English for now.
|
|
</p>
|
|
<form method="post" action="/settings/language" class="settings-row">
|
|
<select name="lang" id="lang-select">
|
|
<option value="en" {% if user.lang == 'en' %}selected{% endif %}>English</option>
|
|
<option value="it" {% if user.lang == 'it' %}selected{% endif %}>Italiano</option>
|
|
<option value="es" disabled>Español (coming soon)</option>
|
|
<option value="fr" disabled>Français (coming soon)</option>
|
|
<option value="de" disabled>Deutsch (coming soon)</option>
|
|
</select>
|
|
<button type="submit" class="settings-btn">Save</button>
|
|
</form>
|
|
</details>
|
|
```
|
|
|
|
### Settings POST endpoint (new)
|
|
|
|
```python
|
|
@router.post("/settings/language")
|
|
async def set_language(
|
|
lang: str = Form(...),
|
|
cu: CurrentUser = Depends(require_auth),
|
|
session: AsyncSession = Depends(get_session),
|
|
):
|
|
if lang not in ACTIVE_LANGUAGES:
|
|
raise HTTPException(status_code=400, detail="unsupported language")
|
|
if cu.user is None:
|
|
raise HTTPException(status_code=403, detail="user required")
|
|
cu.user.lang = lang
|
|
await session.commit()
|
|
return RedirectResponse(url="/settings#language", status_code=303)
|
|
```
|
|
|
|
Server-side validation against `ACTIVE_LANGUAGES` is the gate that
|
|
keeps ES/FR/DE non-functional even if someone POSTs them by hand.
|
|
|
|
## Error handling
|
|
|
|
| Case | Behaviour |
|
|
|---|---|
|
|
| Translation provider down at ai_log_job time | English row still written. Translation row missing for that hour and language. Next hour retries. No retroactive backfill in v1. |
|
|
| Translation returns malformed markdown | Stored anyway (we trust DeepSeek output enough that this is rare). Operator can delete a bad row by hand. |
|
|
| User has `lang=it` but no IT translation for the latest log | Fall back to English silently. Better than an empty pane. |
|
|
| User saves an unsupported lang (`es`/`fr`/`de`/`xx`) via raw POST | 400 — validated against `ACTIVE_LANGUAGES`. |
|
|
| Migrating an existing user with no `lang` column | The `DEFAULT 'en'` clause on the migration handles it; no application code change needed. |
|
|
| User picks Italian, then logs change reaches them mid-hour | The next ai_log_job tick generates and translates a fresh log; users see the IT version on the next refresh. |
|
|
|
|
## Tests
|
|
|
|
Backend (`tests/test_i18n.py`, `tests/test_translation.py`,
|
|
`tests/test_localization_integration.py`):
|
|
|
|
- `respond_in_clause('en')` returns empty string
|
|
- `respond_in_clause('it')` includes the word "Italian"
|
|
- `respond_in_clause('xx')` falls back to "English" (defensive)
|
|
- `translate()` mocked happy path returns the translated text + LogResult
|
|
- `translate()` provider failure raises
|
|
- ai_log_job: with no non-en users, no translation calls happen (mock asserts call_count=0)
|
|
- ai_log_job: with one user at `lang='it'`, one translation row written with the right `lang` and `log_id`
|
|
- ai_log_job: translation failure on one lang doesn't fail the job; the other lang's row still writes
|
|
- `/log` serves IT row when `user.lang='it'` and an IT translation exists
|
|
- `/log` falls back to English when `user.lang='it'` but no IT translation exists
|
|
- `/settings/language` POST: accepts `en`/`it`, rejects `es`/`fr`/`de`/`xx` with 400
|
|
- `analyse()` system prompt contains `"Respond in Italian."` when `lang='it'` (assert on the messages list passed to call_llm)
|
|
- digest job system prompt likewise contains the clause when the user is Italian
|
|
|
|
## Verification
|
|
|
|
End-to-end manual check after deploy:
|
|
|
|
1. **Switch a paid test user to Italian via the settings dropdown.** Confirm `users.lang='it'` in the DB.
|
|
2. **Wait for the next hourly log generation** (or trigger manually via cron/admin). Confirm a new `strategic_log_translations` row exists with `lang='it'` and `content_md` clearly Italian.
|
|
3. **Open the dashboard as that user.** Strategic log renders in Italian.
|
|
4. **Trigger the daily digest send for that user** (CLI: `python -m app.cli send-test-digest user@x daily`). Confirm the received email is in Italian.
|
|
5. **Click "Analyse my portfolio"** on the dashboard. Confirm the AI commentary is in Italian.
|
|
6. **Switch the same user back to English.** Confirm the next dashboard refresh shows the English log. The IT translation row stays in the DB (other IT users still benefit).
|
|
7. **Inspect the dropdown.** Verify ES/FR/DE appear with "(coming soon)" suffix and the option is disabled.
|
|
8. **Attempt `curl -X POST /settings/language -d lang=es`** with a valid session cookie. Expect 400.
|
|
|
|
## Migration / rollout
|
|
|
|
- Alembic migration `0022_localization` adds `users.lang` and creates
|
|
`strategic_log_translations`. Existing rows pick up `en` default.
|
|
- App restart picks up the new code paths. Pre-existing English logs
|
|
stay as-is. The first ai_log_job tick after deploy generates the
|
|
first Italian translation for whatever active IT users exist (likely
|
|
zero on day one until someone opts in).
|
|
- Removing localization later (if needed) is harmless: setting any
|
|
user's `lang` back to `en` makes their experience identical to the
|
|
pre-localization state.
|
|
|
|
## Out-of-scope clarifications
|
|
|
|
- We do not translate UI labels. Italian users see English buttons,
|
|
headings, and tooltips. Future scope.
|
|
- We do not translate user-generated content (chat questions the user
|
|
types). Only the AI's output is localized; user-supplied input flows
|
|
through unchanged.
|
|
- We do not translate the email subject line independently. The same
|
|
per-user LLM call that generates the digest body also generates the
|
|
subject in the target language.
|
|
- We do not surface translation cost in any user-visible UI. Cost is
|
|
recorded in `strategic_log_translations.llm_cost_usd` and the existing
|
|
`ai_calls` ledger picks up per-user calls as today.
|
|
- We do **not** gate strategic-log translation on user tier. Any user
|
|
with `lang='it'` triggers Italian translation for that hour's log,
|
|
regardless of whether they are paid, on credit, or free. Rationale:
|
|
Italian + UK are the first markets the operator is targeting, so
|
|
Italian availability is part of the public-facing experience — a
|
|
free-tier visitor needs to see the AI in Italian to convert. At
|
|
~$0.005/day total cost the gating overhead is not worth the savings.
|