Translation quality benchmarks by language pair

Automated translation metrics summarized for 335 language directions.

How we measure quality →

Benchmark run metadata

Apr 05, 2026, 07:03 PM

Dataset

mixed-parallel-corpora

335

language pairs

source languages

models in grid

segments per pair

Run details

Sources — Source segments come from curated, public-facing parallel corpora and benchmark-style collections mixed for broad coverage. They are not model-generated: originals and references are assembled upstream as parallel text only; this dashboard does not redistribute raw corpora.

Mixed parallel corpora — Parallel segments spanning diverse genres and corpus scales. Each item includes a reference translation that is certified and human-verified—not produced by the models being benchmarked.

Per language pair All language pairs

Language pair (8 source languages × 51 targets)i

EN·English

ZH·Mandarin Chinese·普通话

Sort byi

Loading benchmarks…

Modeli	fluency_v2.0 ranki	Cost ranki	Provideri	fluency_v2.0i	chrFi	BLEUi	COMETi	sacreBLEUi	Leni	Cost/1ki	Badgesi
algebras_router_agentic	1	1	Algebras	5.00	32.3	0.0	0.863	0.0	0.02	$0.001algebras.ai/pricing	✨Best fluency💰Cheapest
Gemini 3 Flash (thinking-minimal) google/gemini-3-flash-preview	1	2	Google	5.00	28.3	2.4	0.868	9.2	0.06	$0.005	✨Best fluency
Gemini 3 Flash google/gemini-3-flash-preview	1	2	Google	5.00	28.4	0.0	0.863	0.0	0.02	$0.005	✨Best fluency
Seed 2.0 (ByteDance) bytedance-seed/seed-2.0-lite	10	4	ByteDance	4.87	22.3	0.0	0.870	0.0	0.02	$0.010
Qwen 3.5+ (max class) qwen/qwen3.5-plus-02-15	12	5	Alibaba	4.78	28.7	2.4	0.860	9.2	0.06	$0.012

About these benchmarks