read.markets/app/services/translation.py
Giorgio Gilestro 48f022b71b i18n: stop truncating IT translations + localise the chat sidebar
Three connected fixes after the user spotted the 2026-05-28 IT log
cutting off mid-sentence:

1. translation: bump max_tokens 4000 → 8000.
   call_llm()'s default cap was 4000, which is what the English log
   generator itself uses as its ceiling. Italian expands roughly 15-25 %
   over English in tokens, so any near-cap English source produced an
   IT translation that hit finish_reason=length and returned a
   truncated body — silently, because _call_provider() only raises when
   content is fully empty. The strategic_log_translations table has
   dozens of rows where completion_tokens landed at exactly 4000 with
   content well under half the source length. 8000 gives ample
   headroom for any of the five LANGUAGES we ship (en/it/es/fr/de).

2. log.html: localise the chat sidebar strings.
   user_lang was already passed into the template by pages.py, so an
   inline {% if user_lang == 'it' %} keeps it simple. Covers the
   "Ask Cassandra" title, the "grounded on…" hint, the helper lede,
   the textarea placeholder, and the Send button label.

3. chat endpoint: append respond_in_clause(user.lang) to the system
   prompt. The chat conversation can now happen in IT — the model's
   first reply lands in the right language even when the user's first
   turn is short.

scripts/backfill_truncated_translations.py: one-off cleanup utility.
Scans strategic_log_translations for rows whose translated content is
< 70 % of the English source (the truncation signal — IT *expands*
beyond English, so a shorter translation is always suspect), deletes
them, and re-translates via the now-uncapped service. Supports --date,
--since, --all and --dry-run. The 2026-05-28 fan-out has already been
re-translated (13/13 rows). Other historical dates still hold older
truncations; the user can decide whether to backfill those (the script
is idempotent).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 11:44:41 +02:00

88 lines
3.5 KiB
Python

"""Markdown translation via the existing LLM provider chain.
DeepSeek-4-flash at ~$0.28/M output tokens is cheap enough that we
don't bother with a separate translation-only model. ``call_llm``'s
provider chain (DeepSeek primary, OpenRouter fallback) handles this
path identically to any other LLM call.
The translator is content-aware in one important way: it instructs the
model to preserve markdown structure, ticker symbols, numbers, dates,
and percentages verbatim. This keeps generated artefacts (tables of
quotes, embedded percentages, dated references) intact across the
translation boundary.
"""
from __future__ import annotations
import httpx
from app.services.i18n import LANGUAGES
from app.services.openrouter import LogResult, call_llm
_SYSTEM_PROMPT_TMPL = """\
You are an expert translator working on financial-markets commentary.
Translate the following markdown text to {language}.
Strict rules:
- Preserve ALL markdown formatting (headings, lists, emphasis, links,
tables, code spans).
- Do NOT translate ticker symbols (AAPL, MSFT, VOD.L, ASML.AS, etc.),
company legal names, percentages, dates, ISO currency codes, or any
numbers.
- Do NOT add commentary, preambles, or apologies. Output ONLY the
translated markdown.
"""
async def translate(
client: httpx.AsyncClient,
text: str,
target_lang: str,
) -> tuple[str, LogResult]:
"""Translate markdown ``text`` to ``target_lang``.
Returns ``(translated_markdown, LogResult)``. Caller persists the
cost/model provenance from LogResult next to the cached row.
Short-circuits without calling the LLM when ``target_lang`` is
``'en'``, unknown, or empty — returns the source unchanged with a
zero-cost stub LogResult. This lets fan-out callers iterate over
all languages without per-call gating.
Raises on provider failure (HTTP error, all chain providers down).
Callers in fan-out paths should catch and log per-language.
"""
if not target_lang or target_lang == "en" or target_lang not in LANGUAGES:
# No-op fast path. Returning a fake LogResult keeps the call
# signature stable for callers who unpack the tuple.
return text, LogResult(
content=text, model="noop",
prompt_tokens=0, completion_tokens=0, cost_usd=0.0,
)
system_prompt = _SYSTEM_PROMPT_TMPL.format(language=LANGUAGES[target_lang])
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": text},
]
# Italian / Spanish / French / German typically expand the token count
# 15-25 % over English (longer words, more sub-word splits). Our
# strategic-log generator runs up to its own 4000-token cap, so a 4000
# cap here would silently truncate any near-cap source. 8000 gives
# ample headroom for every language we currently support and costs
# nothing extra unless the model actually emits more tokens.
result = await call_llm(client, messages, max_tokens=8000)
content = (result.content or "").strip()
# Strip code fences if the model wrapped its output despite the system rule.
if content.startswith("```"):
# Drop the opening fence (with optional language tag).
first_nl = content.find("\n")
if first_nl != -1:
content = content[first_nl + 1:]
# Drop the closing fence.
if content.rstrip().endswith("```"):
content = content.rstrip()[:-3].rstrip()
content = content.strip()
return content, result