Three connected fixes after the user spotted the 2026-05-28 IT log
cutting off mid-sentence:
1. translation: bump max_tokens 4000 → 8000.
call_llm()'s default cap was 4000, which is what the English log
generator itself uses as its ceiling. Italian expands roughly 15-25 %
over English in tokens, so any near-cap English source produced an
IT translation that hit finish_reason=length and returned a
truncated body — silently, because _call_provider() only raises when
content is fully empty. The strategic_log_translations table has
dozens of rows where completion_tokens landed at exactly 4000 with
content well under half the source length. 8000 gives ample
headroom for any of the five LANGUAGES we ship (en/it/es/fr/de).
2. log.html: localise the chat sidebar strings.
user_lang was already passed into the template by pages.py, so an
inline {% if user_lang == 'it' %} keeps it simple. Covers the
"Ask Cassandra" title, the "grounded on…" hint, the helper lede,
the textarea placeholder, and the Send button label.
3. chat endpoint: append respond_in_clause(user.lang) to the system
prompt. The chat conversation can now happen in IT — the model's
first reply lands in the right language even when the user's first
turn is short.
scripts/backfill_truncated_translations.py: one-off cleanup utility.
Scans strategic_log_translations for rows whose translated content is
< 70 % of the English source (the truncation signal — IT *expands*
beyond English, so a shorter translation is always suspect), deletes
them, and re-translates via the now-uncapped service. Supports --date,
--since, --all and --dry-run. The 2026-05-28 fan-out has already been
re-translated (13/13 rows). Other historical dates still hold older
truncations; the user can decide whether to backfill those (the script
is idempotent).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
88 lines
3.5 KiB
Python
88 lines
3.5 KiB
Python
"""Markdown translation via the existing LLM provider chain.
|
|
|
|
DeepSeek-4-flash at ~$0.28/M output tokens is cheap enough that we
|
|
don't bother with a separate translation-only model. ``call_llm``'s
|
|
provider chain (DeepSeek primary, OpenRouter fallback) handles this
|
|
path identically to any other LLM call.
|
|
|
|
The translator is content-aware in one important way: it instructs the
|
|
model to preserve markdown structure, ticker symbols, numbers, dates,
|
|
and percentages verbatim. This keeps generated artefacts (tables of
|
|
quotes, embedded percentages, dated references) intact across the
|
|
translation boundary.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import httpx
|
|
|
|
from app.services.i18n import LANGUAGES
|
|
from app.services.openrouter import LogResult, call_llm
|
|
|
|
|
|
_SYSTEM_PROMPT_TMPL = """\
|
|
You are an expert translator working on financial-markets commentary.
|
|
Translate the following markdown text to {language}.
|
|
|
|
Strict rules:
|
|
- Preserve ALL markdown formatting (headings, lists, emphasis, links,
|
|
tables, code spans).
|
|
- Do NOT translate ticker symbols (AAPL, MSFT, VOD.L, ASML.AS, etc.),
|
|
company legal names, percentages, dates, ISO currency codes, or any
|
|
numbers.
|
|
- Do NOT add commentary, preambles, or apologies. Output ONLY the
|
|
translated markdown.
|
|
"""
|
|
|
|
|
|
async def translate(
|
|
client: httpx.AsyncClient,
|
|
text: str,
|
|
target_lang: str,
|
|
) -> tuple[str, LogResult]:
|
|
"""Translate markdown ``text`` to ``target_lang``.
|
|
|
|
Returns ``(translated_markdown, LogResult)``. Caller persists the
|
|
cost/model provenance from LogResult next to the cached row.
|
|
|
|
Short-circuits without calling the LLM when ``target_lang`` is
|
|
``'en'``, unknown, or empty — returns the source unchanged with a
|
|
zero-cost stub LogResult. This lets fan-out callers iterate over
|
|
all languages without per-call gating.
|
|
|
|
Raises on provider failure (HTTP error, all chain providers down).
|
|
Callers in fan-out paths should catch and log per-language.
|
|
"""
|
|
if not target_lang or target_lang == "en" or target_lang not in LANGUAGES:
|
|
# No-op fast path. Returning a fake LogResult keeps the call
|
|
# signature stable for callers who unpack the tuple.
|
|
return text, LogResult(
|
|
content=text, model="noop",
|
|
prompt_tokens=0, completion_tokens=0, cost_usd=0.0,
|
|
)
|
|
|
|
system_prompt = _SYSTEM_PROMPT_TMPL.format(language=LANGUAGES[target_lang])
|
|
messages = [
|
|
{"role": "system", "content": system_prompt},
|
|
{"role": "user", "content": text},
|
|
]
|
|
# Italian / Spanish / French / German typically expand the token count
|
|
# 15-25 % over English (longer words, more sub-word splits). Our
|
|
# strategic-log generator runs up to its own 4000-token cap, so a 4000
|
|
# cap here would silently truncate any near-cap source. 8000 gives
|
|
# ample headroom for every language we currently support and costs
|
|
# nothing extra unless the model actually emits more tokens.
|
|
result = await call_llm(client, messages, max_tokens=8000)
|
|
|
|
content = (result.content or "").strip()
|
|
# Strip code fences if the model wrapped its output despite the system rule.
|
|
if content.startswith("```"):
|
|
# Drop the opening fence (with optional language tag).
|
|
first_nl = content.find("\n")
|
|
if first_nl != -1:
|
|
content = content[first_nl + 1:]
|
|
# Drop the closing fence.
|
|
if content.rstrip().endswith("```"):
|
|
content = content.rstrip()[:-3].rstrip()
|
|
content = content.strip()
|
|
|
|
return content, result
|