Automated translation metrics summarized for 334 language directions.
Sources — Source segments come from curated, public-facing parallel corpora and benchmark-style collections mixed for broad coverage. They are not model-generated: originals and references are assembled upstream as parallel text only; this dashboard does not redistribute raw corpora.
Mixed parallel corpora — Parallel segments spanning diverse genres and corpus scales. Each item includes a reference translation that is certified and human-verified—not produced by the models being benchmarked.
Modeli | Fluency ranki | Cost ranki | Provideri | Fluencyi | BLEUi | COMETi | sacreBLEUi | Leni | Cost/1ki | Badgesi |
|---|---|---|---|---|---|---|---|---|---|---|
Gemini 3 Flash (thinking-minimal) google/gemini-3-flash-preview | 1 | 1 | 2.86 | 2.4 | 0.868 | 9.2 | 0.06 | $0.005 | ✨Best fluency💰Cheapest | |
Gemini 3 Flash google/gemini-3-flash-preview | 1 | 1 | 2.86 | 0.0 | 0.863 | 0.0 | 0.02 | $0.005 | ✨Best fluency💰Cheapest | |
Seed 2.0 (ByteDance) bytedance-seed/seed-2.0-lite | 9 | 3 | ByteDance | 2.75 | 0.0 | 0.870 | 0.0 | 0.02 | $0.010 | |
Qwen 3.5+ (max class) qwen/qwen3.5-plus-02-15 | 11 | 4 | Alibaba | 2.68 | 2.4 | 0.860 | 9.2 | 0.06 | $0.012 | |
Gemini 3.1 Pro Preview google/gemini-3.1-pro-preview | 1 | 5 | 2.86 | 0.0 | 0.863 | 0.0 | 0.02 | $0.018 | ✨Best fluency | |
This release reports translation quality by language pair: medians and spread of automated scores (fluency, COMET, BLEU, sacreBLEU, length ratio) aggregated across evaluated directions. Run metadata describes recipes, metrics, and segment counts.