Is OpenAI Whisper free for commercial use?

Yes. Whisper is released under the MIT license, which permits commercial use, modification, and distribution. Because you run it on your own hardware there are no per-minute API fees; your only cost is compute (GPU or cloud instances).

What is the difference between Whisper and faster-whisper?

faster-whisper is a re-implementation of Whisper using CTranslate2, a C++ inference engine, delivering 4-8x speedup, INT8 quantization, and built-in VAD filtering while producing identical transcription results. Use faster-whisper for production and OpenAI Whisper for research and experimentation.

Can OpenAI Whisper run on a CPU?

Yes. All models except large-v3 run comfortably on modern CPUs; use the tiny model for near real-time transcription on laptops, or the medium model with INT8 quantization for batch processing. Expect roughly 3-5x slower inference than on a GPU.

Which Whisper model size should I choose?

Use base for English-only quick tasks, small for daily multilingual use, medium for professional accuracy, and large-v3 when maximum accuracy is non-negotiable. The turbo model is the sweet spot for latency-sensitive production workloads, though it was not trained for translation.

Does Whisper support real-time streaming and speaker diarization?

No. Whisper processes audio in 30-second chunks and is not designed for true real-time (<200ms latency) streaming, nor can the base model identify who spoke. For speaker labels use WhisperX or a separate diarization pipeline, and for streaming ASR consider alternatives like NVIDIA Parakeet or Moonshine v2.

OpenAI Whisper：99.8K+ 星标

{{< 资源信息 >}} ＃＃介绍 Speech recognition is the bridge between human conversation and machine-readable data, yet most developers have wrestled with APIs that charge per minute, miss domain terminology, or fail entirely on accented speech. 2022 年底，OpenAI 发布了 Whisper 作为 MIT 许可的开源替代方案，并立即得到了采用——在 GitHub 上获得了 99,800 颗星，它成为生产中采用最多的开源 ASR 系统。本指南将介绍完整的 Whisper 设置，将其与 WhisperX、faster-whisper 和 DeepSpeech 进行比较，并为您提供可以立即部署的生产强化配置。 ## OpenAI Whisper 是什么？ OpenAI Whisper 是一种通用自动语音识别 (ASR) 模型，经过 680,000 小时的多语言和多任务监督数据的训练。它可以执行 99 种语言的语音到文本转录、语音翻译成英语、口语识别和带时间戳的片段对齐。与纯云 API 不同，Whisper 在消费者硬件上完全离线运行，使其成为医疗保健、媒体、呼叫中心和辅助工具中转录管道的支柱。 ## 耳语如何运作 Whisper 遵循编码器-解码器 Transformer 架构。音频输入被转换为对数梅尔频谱图并通过编码器。然后，解码器以告诉模型是否转录、翻译或检测语言的特殊任务标记为条件，以自回归方式预测文本标记。

核心设计决策： - 大规模弱监督：在带有噪声标签的不同网络规模音频上进行训练，而不是在小型原始数据集上进行训练

多任务训练：单个模型通过任务标记处理转录、翻译和语言 ID
分块处理：长音频被分成 30 秒的片段，独立处理，然后重新组合
以先前文本为条件：解码器接收先前的片段标记，以实现跨边界的一致格式 | 型号| 参数| 英语WER | 多语言 WER | 显存（GPU）| 相对速度| |

OpenAI Whisper：99.8K+ 星标

📦 出现在以下合集中

💬 留言讨论

🔗 相关资源推荐

📦 出现在以下合集中

💬 留言讨论