docs: spec for Italian localization (ES/FR/DE as WIP)
Hybrid model: per-user surfaces (analyse, digest, chat) generated directly in the target language via a "Respond in Italian" clause appended to the system prompt. Shared content (strategic log) generated in English as today, then post-translated and cached per language in a new strategic_log_translations table. Translation calls fan out in parallel with asyncio.gather so total job latency stays bounded by max(single call). No separate translation-model setting — DeepSeek-4-flash at $0.28/M output is cheap enough that the routine cost is noise (~$0.005/day with Italian only at 24 logs/day). Users.lang VARCHAR(8) DEFAULT 'en'. Settings dropdown lists all four options but ES/FR/DE are disabled UI-side and rejected server-side against an ACTIVE_LANGUAGES allowlist — flipping them on later is a one-line constant change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
1ecc527118
commit
76f81648e5
1 changed files with 375 additions and 0 deletions
375
docs/superpowers/specs/2026-05-27-localization-italian-design.md
Normal file
375
docs/superpowers/specs/2026-05-27-localization-italian-design.md
Normal file
|
|
@ -0,0 +1,375 @@
|
||||||
|
# Localization (Italian active, ES/FR/DE WIP) — Design Spec
|
||||||
|
|
||||||
|
**Date:** 2026-05-27
|
||||||
|
**Status:** Draft — pending implementation plan
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
All AI-generated content (strategic log, daily email digest, portfolio
|
||||||
|
analysis, follow-up chat) is English-only today. The operator wants to
|
||||||
|
add Italian translation as the first localization, with Spanish,
|
||||||
|
French, and German listed as "coming soon" in the settings UI but not
|
||||||
|
yet functional. Italian must work end-to-end from settings dropdown to
|
||||||
|
rendered output; the other three exist as commitments and design
|
||||||
|
placeholders so adding them later is a flag flip.
|
||||||
|
|
||||||
|
This is foundational plumbing: it touches every LLM call site we ship
|
||||||
|
today and shapes how every future AI feature handles language. Doing it
|
||||||
|
first means later features (qty/cost edit narratives, P/L summaries,
|
||||||
|
alert text, etc.) inherit the i18n wiring for free instead of needing a
|
||||||
|
retrofit.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
- A user can pick `Italiano` from a settings dropdown and immediately
|
||||||
|
see every AI-generated surface in Italian.
|
||||||
|
- Adding `es`, `fr`, or `de` later is a one-line change to a constant
|
||||||
|
plus optionally validating the dropdown's enabled set.
|
||||||
|
- Translation cost stays in the "noise" range — we use the same
|
||||||
|
DeepSeek-4-flash model the rest of the system uses (~$0.28/M output
|
||||||
|
tokens). No separate "cheap translation" plumbing.
|
||||||
|
- Strategic-log reads stay instant for non-English users — no
|
||||||
|
read-time translation latency.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- UI label translation. The dashboard buttons, settings labels,
|
||||||
|
headings, and other chrome remain English. Only the AI's own output
|
||||||
|
is localized.
|
||||||
|
- Translation of indicator summaries. The same pattern will apply when
|
||||||
|
those become user-facing prose, but they aren't surfaced today.
|
||||||
|
- Backfilling translations for historical strategic logs. Translation
|
||||||
|
only happens going forward, at the moment a new English log is written.
|
||||||
|
- Activation of Spanish/French/German. They appear in the dropdown as
|
||||||
|
"coming soon" with disabled options; the value-validation layer in
|
||||||
|
the settings POST refuses them.
|
||||||
|
|
||||||
|
## Two distinct translation paths
|
||||||
|
|
||||||
|
The system has two categories of AI-generated content, with different
|
||||||
|
generation patterns:
|
||||||
|
|
||||||
|
### Per-user content (analyse, digest, chat)
|
||||||
|
|
||||||
|
Each call already produces output for exactly one user. The fix is
|
||||||
|
trivial: the user's `lang` threads into the prompt assembly, and the
|
||||||
|
system prompt gains a `"Respond in Italian."` clause when `lang != 'en'`.
|
||||||
|
One LLM call, no extra cost, no extra latency.
|
||||||
|
|
||||||
|
### Shared content (strategic log)
|
||||||
|
|
||||||
|
The hourly `ai_log_job` writes a single English log row used by every
|
||||||
|
user. To serve non-English users, we generate the English log as today,
|
||||||
|
then translate it to each active non-English language via a separate
|
||||||
|
LLM call and store the result in a new `strategic_log_translations`
|
||||||
|
table. Translations are fanned out in parallel with `asyncio.gather` so
|
||||||
|
total translation time is max(single call), not sum. The `/log`
|
||||||
|
endpoint serves the translation matching the requester's `lang`,
|
||||||
|
falling back to English if none exists.
|
||||||
|
|
||||||
|
Why translate-after rather than generate-N-times: the strategic log
|
||||||
|
includes live market data, headlines, and references that are
|
||||||
|
expensive to assemble. Re-running the full generation in each language
|
||||||
|
duplicates that work; translating the rendered output preserves a
|
||||||
|
single source of truth (the English original) and only spends LLM
|
||||||
|
tokens on the actual prose conversion.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ User has user.lang preference │
|
||||||
|
│ Values: 'en' (default) | 'it' (active) | 'es'/'fr'/'de' (WIP) │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
├─ Per-user surfaces (portfolio analyse, daily digest, chat)
|
||||||
|
│ └─ prompt assembly threads user.lang to
|
||||||
|
│ respond_in_clause() → appended to system prompt
|
||||||
|
│ when lang != 'en'. Single call_llm, no extra cost.
|
||||||
|
│
|
||||||
|
└─ Shared surfaces (strategic log)
|
||||||
|
├─ ai_log_job writes the English row as today
|
||||||
|
├─ Then SELECTs distinct users.lang where lang != 'en'
|
||||||
|
│ AND user has active paid access
|
||||||
|
├─ asyncio.gather of one translate() call per language
|
||||||
|
└─ Each result → INSERT into strategic_log_translations
|
||||||
|
keyed by (log_id, lang) UNIQUE
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data model
|
||||||
|
|
||||||
|
### `users.lang` (new column)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
ALTER TABLE users
|
||||||
|
ADD COLUMN lang VARCHAR(8) NOT NULL DEFAULT 'en';
|
||||||
|
```
|
||||||
|
|
||||||
|
Existing rows pick up the `en` default. Application-level validation
|
||||||
|
restricts writes to the `ACTIVE_LANGUAGES` set; the database column
|
||||||
|
accepts anything in `VARCHAR(8)` (no CHECK constraint — we want to
|
||||||
|
add new languages without a migration).
|
||||||
|
|
||||||
|
### `strategic_log_translations` (new table)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE strategic_log_translations (
|
||||||
|
id BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||||
|
log_id BIGINT NOT NULL,
|
||||||
|
lang VARCHAR(8) NOT NULL,
|
||||||
|
content_md TEXT NOT NULL,
|
||||||
|
generated_at DATETIME(6) NOT NULL,
|
||||||
|
llm_model VARCHAR(64),
|
||||||
|
llm_cost_usd FLOAT,
|
||||||
|
CONSTRAINT fk_slt_log
|
||||||
|
FOREIGN KEY (log_id) REFERENCES strategic_logs(id) ON DELETE CASCADE,
|
||||||
|
CONSTRAINT uq_slt_log_lang UNIQUE (log_id, lang)
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
ON DELETE CASCADE means evicting an old strategic log row also drops
|
||||||
|
its translations. The UNIQUE constraint prevents duplicate translations
|
||||||
|
for the same log/lang combo.
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### `app/services/i18n.py` (new)
|
||||||
|
|
||||||
|
```python
|
||||||
|
LANGUAGES = {
|
||||||
|
"en": "English",
|
||||||
|
"it": "Italian",
|
||||||
|
"es": "Spanish",
|
||||||
|
"fr": "French",
|
||||||
|
"de": "German",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Set of language codes that users can actually pick from the settings
|
||||||
|
# dropdown. ES/FR/DE remain in LANGUAGES so their labels render, but
|
||||||
|
# the settings POST validator and the strategic-log translation fan-out
|
||||||
|
# both consult this set.
|
||||||
|
ACTIVE_LANGUAGES = {"en", "it"}
|
||||||
|
|
||||||
|
|
||||||
|
def respond_in_clause(lang: str) -> str:
|
||||||
|
"""Suffix appended to per-user LLM system prompts.
|
||||||
|
|
||||||
|
Returns an empty string for 'en' (the default everywhere already).
|
||||||
|
Otherwise returns "\n\nRespond in <Language>." so the model knows
|
||||||
|
to write its output in the user's language.
|
||||||
|
"""
|
||||||
|
if not lang or lang == "en":
|
||||||
|
return ""
|
||||||
|
name = LANGUAGES.get(lang, "English")
|
||||||
|
return f"\n\nRespond in {name}."
|
||||||
|
```
|
||||||
|
|
||||||
|
### `app/services/translation.py` (new)
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def translate(
|
||||||
|
client: httpx.AsyncClient,
|
||||||
|
text: str,
|
||||||
|
target_lang: str,
|
||||||
|
) -> tuple[str, LogResult]:
|
||||||
|
"""Translate ``text`` (markdown) to ``target_lang``.
|
||||||
|
|
||||||
|
Uses the default ``call_llm`` provider chain — DeepSeek-4-flash via
|
||||||
|
the OG API is already cheap enough ($0.28/M output) that a separate
|
||||||
|
'translation model' setting would be over-engineering.
|
||||||
|
|
||||||
|
Returns ``(translated_markdown, LogResult)`` so the caller can
|
||||||
|
persist provenance (model + cost) alongside the translation.
|
||||||
|
Raises on provider failure; caller decides whether to surface or
|
||||||
|
swallow.
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
|
||||||
|
System prompt: *"Translate the following markdown to {language}. Preserve all formatting (headings, lists, links, emphasis). Do NOT translate ticker symbols, company names, numbers, percentages, or dates. Output ONLY the translated markdown — no preamble, no commentary."*
|
||||||
|
|
||||||
|
### `app/models.py` (modified)
|
||||||
|
|
||||||
|
- `User`: add `lang: Mapped[str] = mapped_column(String(8), nullable=False, default="en", server_default="en")`
|
||||||
|
- New class `StrategicLogTranslation` matching the table above
|
||||||
|
|
||||||
|
### `app/jobs/ai_log_job.py` (modified)
|
||||||
|
|
||||||
|
After the existing English log row is persisted, add a translation
|
||||||
|
fan-out:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Select distinct active non-English languages.
|
||||||
|
async with session_factory() as session:
|
||||||
|
rows = (await session.execute(
|
||||||
|
select(User.lang).distinct()
|
||||||
|
.where(User.lang.in_(ACTIVE_LANGUAGES - {"en"}))
|
||||||
|
)).scalars().all()
|
||||||
|
active_langs = list(rows)
|
||||||
|
|
||||||
|
if active_langs:
|
||||||
|
async with httpx.AsyncClient(...) as client:
|
||||||
|
results = await asyncio.gather(*[
|
||||||
|
translate(client, log_row.content_md, lang)
|
||||||
|
for lang in active_langs
|
||||||
|
], return_exceptions=True)
|
||||||
|
for lang, result in zip(active_langs, results):
|
||||||
|
if isinstance(result, Exception):
|
||||||
|
log.warning("log.translate.failed", lang=lang, error=str(result)[:200])
|
||||||
|
continue
|
||||||
|
translated_md, llm_log = result
|
||||||
|
session.add(StrategicLogTranslation(
|
||||||
|
log_id=log_row.id, lang=lang,
|
||||||
|
content_md=translated_md,
|
||||||
|
generated_at=utcnow(),
|
||||||
|
llm_model=llm_log.model,
|
||||||
|
llm_cost_usd=llm_log.cost_usd,
|
||||||
|
))
|
||||||
|
await session.commit()
|
||||||
|
```
|
||||||
|
|
||||||
|
Errors in individual language translations are logged but do not fail
|
||||||
|
the job. Missing translations get rendered as the English fallback at
|
||||||
|
read time.
|
||||||
|
|
||||||
|
### `app/jobs/email_digest_job.py` (modified)
|
||||||
|
|
||||||
|
The digest is already per-user and assembles its own prompt. Thread
|
||||||
|
`user.lang` through:
|
||||||
|
|
||||||
|
- `_generate_variants(...)` accepts a `target_lang` param
|
||||||
|
- The system prompt assembly appends `respond_in_clause(target_lang)`
|
||||||
|
- Subject-line generation runs in the same call, so it's localized too
|
||||||
|
|
||||||
|
### `app/services/portfolio_analysis.py` (modified)
|
||||||
|
|
||||||
|
- `AnalysisRequest` gains a `lang: str = "en"` field, populated by the
|
||||||
|
route from `principal.user.lang`
|
||||||
|
- `analyse(...)` appends `respond_in_clause(req.lang)` to its system prompt
|
||||||
|
|
||||||
|
### `app/routers/universe.py` (modified — the `/api/analyze` route)
|
||||||
|
|
||||||
|
Read the current user's `lang` and put it in the payload before calling
|
||||||
|
`analyse(...)`. (The current route gets the principal via Depends.)
|
||||||
|
|
||||||
|
### `app/routers/pages.py` / the `/log` resolution (modified)
|
||||||
|
|
||||||
|
When rendering `/log` (and the `/log/{day}` historical variant), look
|
||||||
|
up the user's `lang`. If `lang != 'en'`, attempt to fetch the matching
|
||||||
|
`StrategicLogTranslation`; if present, render that. If absent, fall
|
||||||
|
back to the English `StrategicLog.content_md`. No silent error — the
|
||||||
|
fallback is the intended graceful path.
|
||||||
|
|
||||||
|
### Settings UI (`app/templates/settings.html` modified)
|
||||||
|
|
||||||
|
New section under existing user preferences (alongside the digest-tone
|
||||||
|
toggle):
|
||||||
|
|
||||||
|
```html
|
||||||
|
<details class="settings-section">
|
||||||
|
<summary class="settings-section__head">Language</summary>
|
||||||
|
<p class="settings-section__lede">
|
||||||
|
The language the AI uses for the strategic log, your daily digest,
|
||||||
|
and portfolio commentary. UI labels stay in English for now.
|
||||||
|
</p>
|
||||||
|
<form method="post" action="/settings/language" class="settings-row">
|
||||||
|
<select name="lang" id="lang-select">
|
||||||
|
<option value="en" {% if user.lang == 'en' %}selected{% endif %}>English</option>
|
||||||
|
<option value="it" {% if user.lang == 'it' %}selected{% endif %}>Italiano</option>
|
||||||
|
<option value="es" disabled>Español (coming soon)</option>
|
||||||
|
<option value="fr" disabled>Français (coming soon)</option>
|
||||||
|
<option value="de" disabled>Deutsch (coming soon)</option>
|
||||||
|
</select>
|
||||||
|
<button type="submit" class="settings-btn">Save</button>
|
||||||
|
</form>
|
||||||
|
</details>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Settings POST endpoint (new)
|
||||||
|
|
||||||
|
```python
|
||||||
|
@router.post("/settings/language")
|
||||||
|
async def set_language(
|
||||||
|
lang: str = Form(...),
|
||||||
|
cu: CurrentUser = Depends(require_auth),
|
||||||
|
session: AsyncSession = Depends(get_session),
|
||||||
|
):
|
||||||
|
if lang not in ACTIVE_LANGUAGES:
|
||||||
|
raise HTTPException(status_code=400, detail="unsupported language")
|
||||||
|
if cu.user is None:
|
||||||
|
raise HTTPException(status_code=403, detail="user required")
|
||||||
|
cu.user.lang = lang
|
||||||
|
await session.commit()
|
||||||
|
return RedirectResponse(url="/settings#language", status_code=303)
|
||||||
|
```
|
||||||
|
|
||||||
|
Server-side validation against `ACTIVE_LANGUAGES` is the gate that
|
||||||
|
keeps ES/FR/DE non-functional even if someone POSTs them by hand.
|
||||||
|
|
||||||
|
## Error handling
|
||||||
|
|
||||||
|
| Case | Behaviour |
|
||||||
|
|---|---|
|
||||||
|
| Translation provider down at ai_log_job time | English row still written. Translation row missing for that hour and language. Next hour retries. No retroactive backfill in v1. |
|
||||||
|
| Translation returns malformed markdown | Stored anyway (we trust DeepSeek output enough that this is rare). Operator can delete a bad row by hand. |
|
||||||
|
| User has `lang=it` but no IT translation for the latest log | Fall back to English silently. Better than an empty pane. |
|
||||||
|
| User saves an unsupported lang (`es`/`fr`/`de`/`xx`) via raw POST | 400 — validated against `ACTIVE_LANGUAGES`. |
|
||||||
|
| Migrating an existing user with no `lang` column | The `DEFAULT 'en'` clause on the migration handles it; no application code change needed. |
|
||||||
|
| User picks Italian, then logs change reaches them mid-hour | The next ai_log_job tick generates and translates a fresh log; users see the IT version on the next refresh. |
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
|
||||||
|
Backend (`tests/test_i18n.py`, `tests/test_translation.py`,
|
||||||
|
`tests/test_localization_integration.py`):
|
||||||
|
|
||||||
|
- `respond_in_clause('en')` returns empty string
|
||||||
|
- `respond_in_clause('it')` includes the word "Italian"
|
||||||
|
- `respond_in_clause('xx')` falls back to "English" (defensive)
|
||||||
|
- `translate()` mocked happy path returns the translated text + LogResult
|
||||||
|
- `translate()` provider failure raises
|
||||||
|
- ai_log_job: with no non-en users, no translation calls happen (mock asserts call_count=0)
|
||||||
|
- ai_log_job: with one user at `lang='it'`, one translation row written with the right `lang` and `log_id`
|
||||||
|
- ai_log_job: translation failure on one lang doesn't fail the job; the other lang's row still writes
|
||||||
|
- `/log` serves IT row when `user.lang='it'` and an IT translation exists
|
||||||
|
- `/log` falls back to English when `user.lang='it'` but no IT translation exists
|
||||||
|
- `/settings/language` POST: accepts `en`/`it`, rejects `es`/`fr`/`de`/`xx` with 400
|
||||||
|
- `analyse()` system prompt contains `"Respond in Italian."` when `lang='it'` (assert on the messages list passed to call_llm)
|
||||||
|
- digest job system prompt likewise contains the clause when the user is Italian
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
End-to-end manual check after deploy:
|
||||||
|
|
||||||
|
1. **Switch a paid test user to Italian via the settings dropdown.** Confirm `users.lang='it'` in the DB.
|
||||||
|
2. **Wait for the next hourly log generation** (or trigger manually via cron/admin). Confirm a new `strategic_log_translations` row exists with `lang='it'` and `content_md` clearly Italian.
|
||||||
|
3. **Open the dashboard as that user.** Strategic log renders in Italian.
|
||||||
|
4. **Trigger the daily digest send for that user** (CLI: `python -m app.cli send-test-digest user@x daily`). Confirm the received email is in Italian.
|
||||||
|
5. **Click "Analyse my portfolio"** on the dashboard. Confirm the AI commentary is in Italian.
|
||||||
|
6. **Switch the same user back to English.** Confirm the next dashboard refresh shows the English log. The IT translation row stays in the DB (other IT users still benefit).
|
||||||
|
7. **Inspect the dropdown.** Verify ES/FR/DE appear with "(coming soon)" suffix and the option is disabled.
|
||||||
|
8. **Attempt `curl -X POST /settings/language -d lang=es`** with a valid session cookie. Expect 400.
|
||||||
|
|
||||||
|
## Migration / rollout
|
||||||
|
|
||||||
|
- Alembic migration `0022_localization` adds `users.lang` and creates
|
||||||
|
`strategic_log_translations`. Existing rows pick up `en` default.
|
||||||
|
- App restart picks up the new code paths. Pre-existing English logs
|
||||||
|
stay as-is. The first ai_log_job tick after deploy generates the
|
||||||
|
first Italian translation for whatever active IT users exist (likely
|
||||||
|
zero on day one until someone opts in).
|
||||||
|
- Removing localization later (if needed) is harmless: setting any
|
||||||
|
user's `lang` back to `en` makes their experience identical to the
|
||||||
|
pre-localization state.
|
||||||
|
|
||||||
|
## Out-of-scope clarifications
|
||||||
|
|
||||||
|
- We do not translate UI labels. Italian users see English buttons,
|
||||||
|
headings, and tooltips. Future scope.
|
||||||
|
- We do not translate user-generated content (chat questions the user
|
||||||
|
types). Only the AI's output is localized; user-supplied input flows
|
||||||
|
through unchanged.
|
||||||
|
- We do not translate the email subject line independently. The same
|
||||||
|
per-user LLM call that generates the digest body also generates the
|
||||||
|
subject in the target language.
|
||||||
|
- We do not surface translation cost in any user-visible UI. Cost is
|
||||||
|
recorded in `strategic_log_translations.llm_cost_usd` and the existing
|
||||||
|
`ai_calls` ledger picks up per-user calls as today.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue