Ray ASR Benchmark: 98 Languages, Full Results

-

We tested Ray Max and Ray Turbo across 98 languages. Here’s every result.

Two Engines, One Goal

Ray ships with two speech recognition engines.

Ray Max is built for accuracy. It takes more time and delivers the lowest error rates across the widest range of languages.

Ray Turbo is built for speed. Up to 16x faster than Ray Max with minimal accuracy tradeoff. Designed for real-time playback and large batch processing.

We benchmarked both engines across 98 languages alongside three widely-used open-source ASR models for reference. Every result is published below. Nothing excluded.


How to Read the Results

All results use Word Error Rate (WER) โ€” the percentage of words the model gets wrong. Lower is better. A WER of 5% means near-perfect transcription. A WER above 100% means the model produced more errors than words in the source audio.

We also present the same data as Accuracy (100 โˆ’ WER) where higher is better.


Overall Performance

WER distribution across all 98 languages. Lower is better.


Same data as Accuracy. Higher is better.

Full Results: All 98 Languages

Complete WER heatmap. All models, all 98 languages. Sorted by Ray Max performance.
Same data as Accuracy.


High-Performing Languages

Languages where Ray Max achieves below 8% WER.
Same languages as Accuracy.



Most Challenging Languages

Languages where at least one model exceeds 100% WER. Red dashed line marks the 100% threshold.
Same languages as Accuracy.

Ray Max vs Ray Turbo: When to Use Which

Ray MaxRay Turbo
PriorityAccuracySpeed
Best forProfessional subtitling, low-resource languages, archival work, published contentReal-time playback, live viewing, batch processing, high-resource languages
SpeedBaselineUp to 16x faster

Raw Data

Full benchmark data (FLEURS) and the chart generation script are available for anyone to verify or reproduce.


Methodology

  • Dataset: FLEURS multilingual benchmark across 98 languages
  • Metric: Word Error Rate (WER) via the jiwer library โ€” lower is better
  • Accuracy calculated as max(0, 100 โˆ’ WER) โ€” higher is better
  • Text normalization applied before evaluation:
  • Case normalization (all uppercase)
  • Punctuation removed
  • Abbreviation expansion (Mr โ†’ Mister, Dr โ†’ Doctor, etc.)
  • Special tokens and markup stripped
  • CJK and Southeast Asian languages evaluated at character level โ€” characters are spaced so WER effectively measures Character Error Rate (CER) for these scripts
  • Compound word normalization applied โ€” if a reference compound word appears split across candidate outputs, the reference is split accordingly for fair comparison
  • WER above 100% indicates hallucinated output where the model inserted more erroneous words than exist in the source reference
  • All models evaluated under identical conditions using the same normalized references

Ray Max and Ray Turbo are available in Ray Media Player.

rayplayer.com

Share this article

Recent posts

Google search engine

Popular categories

Leave A Reply

Please enter your comment!
Please enter your name here

Recent comments