Speech & Audio
2022
INTERMEDIATERobust Speech Recognition via Large-Scale Weak Supervision
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever · 2022
Whisper. An encoder-decoder Transformer trained on 680k hours of multilingual weakly-supervised audio — near-human speech recognition, zero-shot.
What you'll get
- Outline: a plain-English breakdown of the paper's core idea, prerequisites, and the concepts you'll need to implement it.
- Exercises: five to ten hands-on tasks, each with a concept card, a prompt, a starter code stub, and a collapsible reference solution.
- Runnable notebook: a single
.ipynbyou can download and open in Jupyter or VS Code to work through every exercise. - Extensions: suggested follow-up experiments so you don't stop at a faithful reimplementation.