Ray ASR Benchmark: 98 Languages, Full Results

We tested Ray Max and Ray Turbo across 98 languages. Here’s every result.

Two Engines, One Goal

Ray ships with two speech recognition engines.

Ray Max is built for accuracy. It takes more time and delivers the lowest error rates across the widest range of languages.

Ray Turbo is built for speed. Up to 16x faster than Ray Max with minimal accuracy tradeoff. Designed for real-time playback and large batch processing.

We benchmarked both engines across 98 languages alongside three widely-used open-source ASR models for reference. Every result is published below. Nothing excluded.

How to Read the Results

All results use Word Error Rate (WER) — the percentage of words the model gets wrong. Lower is better. A WER of 5% means near-perfect transcription. A WER above 100% means the model produced more errors than words in the source audio.

We also present the same data as Accuracy (100 − WER) where higher is better.

Overall Performance

WER distribution across all 98 languages. Lower is better.

Same data as Accuracy. Higher is better.

Full Results: All 98 Languages

Complete WER heatmap. All models, all 98 languages. Sorted by Ray Max performance.

High-Performing Languages

Languages where Ray Max achieves below 8% WER.

Most Challenging Languages

Languages where at least one model exceeds 100% WER. Red dashed line marks the 100% threshold.

Ray Max vs Ray Turbo: When to Use Which

	Ray Max	Ray Turbo
Priority	Accuracy	Speed
Best for	Professional subtitling, low-resource languages, archival work, published content	Real-time playback, live viewing, batch processing, high-resource languages
Speed	Baseline	Up to 16x faster

Raw Data

Full benchmark data (FLEURS) and the chart generation script are available for anyone to verify or reproduce.

Methodology

Dataset: FLEURS multilingual benchmark across 98 languages
Metric: Word Error Rate (WER) via the jiwer library — lower is better
Accuracy calculated as max(0, 100 − WER) — higher is better
Text normalization applied before evaluation:
Case normalization (all uppercase)
Punctuation removed
Abbreviation expansion (Mr → Mister, Dr → Doctor, etc.)
Special tokens and markup stripped
CJK and Southeast Asian languages evaluated at character level — characters are spaced so WER effectively measures Character Error Rate (CER) for these scripts
Compound word normalization applied — if a reference compound word appears split across candidate outputs, the reference is split accordingly for fair comparison
WER above 100% indicates hallucinated output where the model inserted more erroneous words than exist in the source reference
All models evaluated under identical conditions using the same normalized references

Ray Max and Ray Turbo are available in Ray Media Player.

rayplayer.com

Two Engines, One Goal

How to Read the Results

Overall Performance

Full Results: All 98 Languages

High-Performing Languages

Most Challenging Languages

Ray Max vs Ray Turbo: When to Use Which

Raw Data

Methodology

Share this article

Recent posts

Why Every Subtitle You Download Is a Gamble — And How We Fixed It

Ray FAQ: Everything You Need to Know

Ray Is Coming to Kickstarter: The World’s Most Advanced AI Media Player

Popular categories

Leave A Reply Cancel reply

Recent comments