Translation quality benchmarks by language pair

Automated translation metrics summarized for 334 language directions.

How we measure quality →

Benchmark run metadata

Apr 05, 2026, 07:03 PM

Dataset

mixed-parallel-corpora

334

language pairs

source languages

models in grid

segments per pair

Run details

Sources — Source segments come from curated, public-facing parallel corpora and benchmark-style collections mixed for broad coverage. They are not model-generated: originals and references are assembled upstream as parallel text only; this dashboard does not redistribute raw corpora.

Mixed parallel corpora — Parallel segments spanning diverse genres and corpus scales. Each item includes a reference translation that is certified and human-verified—not produced by the models being benchmarked.

PairPer language pair LanguagesAll language pairs

Language pair (8 source languages × 51 targets)i

EN·English

ZH·Mandarin Chinese·普通话

Sort byi

Loading benchmarks…

Modeli	Fluency ranki	Cost ranki	Provideri	Fluencyi	BLEUi	COMETi	sacreBLEUi	Leni	Cost/1ki	Badgesi
Gemini 3 Flash (thinking-minimal) google/gemini-3-flash-preview	1	1	Google	2.86	2.4	0.868	9.2	0.06	$0.005	✨Best fluency💰Cheapest
Gemini 3 Flash google/gemini-3-flash-preview	1	1	Google	2.86	0.0	0.863	0.0	0.02	$0.005	✨Best fluency💰Cheapest
Seed 2.0 (ByteDance) bytedance-seed/seed-2.0-lite	9	3	ByteDance	2.75	0.0	0.870	0.0	0.02	$0.010
Qwen 3.5+ (max class) qwen/qwen3.5-plus-02-15	11	4	Alibaba	2.68	2.4	0.860	9.2	0.06	$0.012
Gemini 3.1 Pro Preview google/gemini-3.1-pro-preview	1	5	Google	2.86	0.0	0.863	0.0	0.02	$0.018	✨Best fluency

About these benchmarks