We tested Ray Max and Ray Turbo across 98 languages. Here’s every result.
Two Engines, One Goal
Ray ships with two speech recognition engines.
Ray Max is built for accuracy. It takes more time and delivers the lowest error rates across the widest range of languages.
Ray Turbo is built for speed. Up to 16x faster than Ray Max with minimal accuracy tradeoff. Designed for real-time playback and large batch processing.
We benchmarked both engines across 98 languages alongside three widely-used open-source ASR models for reference. Every result is published below. Nothing excluded.
How to Read the Results
All results use Word Error Rate (WER) โ the percentage of words the model gets wrong. Lower is better. A WER of 5% means near-perfect transcription. A WER above 100% means the model produced more errors than words in the source audio.
We also present the same data as Accuracy (100 โ WER) where higher is better.
Overall Performance


Full Results: All 98 Languages


High-Performing Languages


Most Challenging Languages


Ray Max vs Ray Turbo: When to Use Which
| Ray Max | Ray Turbo | |
|---|---|---|
| Priority | Accuracy | Speed |
| Best for | Professional subtitling, low-resource languages, archival work, published content | Real-time playback, live viewing, batch processing, high-resource languages |
| Speed | Baseline | Up to 16x faster |
Raw Data
Full benchmark data (FLEURS) and the chart generation script are available for anyone to verify or reproduce.
Methodology
- Dataset: FLEURS multilingual benchmark across 98 languages
- Metric: Word Error Rate (WER) via the
jiwerlibrary โ lower is better - Accuracy calculated as max(0, 100 โ WER) โ higher is better
- Text normalization applied before evaluation:
- Case normalization (all uppercase)
- Punctuation removed
- Abbreviation expansion (Mr โ Mister, Dr โ Doctor, etc.)
- Special tokens and markup stripped
- CJK and Southeast Asian languages evaluated at character level โ characters are spaced so WER effectively measures Character Error Rate (CER) for these scripts
- Compound word normalization applied โ if a reference compound word appears split across candidate outputs, the reference is split accordingly for fair comparison
- WER above 100% indicates hallucinated output where the model inserted more erroneous words than exist in the source reference
- All models evaluated under identical conditions using the same normalized references
Ray Max and Ray Turbo are available in Ray Media Player.


