# Japanese-ASR Style Benchmark

- Method: same dataset/metric style as the `japanese-asr` Hugging Face page.
- Datasets: CommonVoice 8 Japanese test, JSUT Basic5000, ReazonSpeech held-out test.
- Metrics: `cer_raw`, `wer_raw`, `cer_norm`, `wer_norm`; normalized text uses `BasicTextNormalizer()` and removes spaces.
- Ranking: average `cer_norm` across completed datasets; lower is better.

## Ranking

| Rank | Model | Datasets | Avg CER norm | Avg WER norm | Avg speed | Avg latency |
|---:|---|---:|---:|---:|---:|---:|
| 1 | CohereLabs/cohere-transcribe-03-2026 | 3 | 6.41 | 43.80 | 21.40x | 0.251s |
| 2 | nvidia/parakeet-tdt_ctc-0.6b-ja | 3 | 8.11 | 51.45 | 64.58x | 0.082s |
| 3 | faster-whisper-large-v3-turbo | 3 | 10.52 | 59.15 | 12.77x | 0.417s |
| 4 | Qwen3-ASR-1.7B | 3 | 14.70 | 67.77 | 6.99x | 0.772s |
| 5 | nvidia/nemotron-3.5-asr-streaming-0.6b | 3 | 20.93 | 83.44 | 11.09x | 0.480s |

## Per Dataset

| Model | Dataset | Samples | CER norm | WER norm | CER raw | WER raw | Speed | Avg latency |
|---|---|---:|---:|---:|---:|---:|---:|---:|
| CohereLabs/cohere-transcribe-03-2026 | common_voice_8_0 | 4483 | 4.07 | 30.89 | 9.34 | 100.00 | 21.85x | 0.238s |
| CohereLabs/cohere-transcribe-03-2026 | jsut_basic5000 | 5000 | 8.28 | 62.28 | 11.69 | 92.58 | 17.73x | 0.275s |
| CohereLabs/cohere-transcribe-03-2026 | reazonspeech_test | 5263 | 6.89 | 38.23 | 10.06 | 66.06 | 24.63x | 0.239s |
| Qwen3-ASR-1.7B | common_voice_8_0 | 4483 | 9.46 | 58.55 | 14.80 | 100.00 | 7.95x | 0.655s |
| Qwen3-ASR-1.7B | jsut_basic5000 | 5000 | 8.93 | 66.18 | 11.55 | 90.68 | 5.67x | 0.861s |
| Qwen3-ASR-1.7B | reazonspeech_test | 5263 | 25.71 | 78.57 | 29.94 | 89.06 | 7.35x | 0.800s |
| faster-whisper-large-v3-turbo | common_voice_8_0 | 4483 | 9.16 | 56.50 | 15.73 | 100.00 | 12.64x | 0.412s |
| faster-whisper-large-v3-turbo | jsut_basic5000 | 5000 | 7.27 | 60.78 | 12.11 | 95.46 | 11.55x | 0.422s |
| faster-whisper-large-v3-turbo | reazonspeech_test | 5263 | 15.14 | 60.17 | 20.16 | 91.24 | 14.11x | 0.416s |
| nvidia/nemotron-3.5-asr-streaming-0.6b | common_voice_8_0 | 4483 | 18.79 | 78.67 | 44.85 | 100.00 | 10.75x | 0.484s |
| nvidia/nemotron-3.5-asr-streaming-0.6b | jsut_basic5000 | 5000 | 13.47 | 81.88 | 37.85 | 100.00 | 10.18x | 0.479s |
| nvidia/nemotron-3.5-asr-streaming-0.6b | reazonspeech_test | 5263 | 30.52 | 89.76 | 48.26 | 99.85 | 12.34x | 0.476s |
| nvidia/parakeet-tdt_ctc-0.6b-ja | common_voice_8_0 | 4483 | 7.48 | 52.09 | 13.46 | 100.00 | 63.48x | 0.082s |
| nvidia/parakeet-tdt_ctc-0.6b-ja | jsut_basic5000 | 5000 | 6.63 | 57.90 | 10.59 | 92.80 | 58.61x | 0.083s |
| nvidia/parakeet-tdt_ctc-0.6b-ja | reazonspeech_test | 5263 | 10.20 | 44.37 | 14.13 | 70.99 | 71.67x | 0.082s |
