WAVLab | Speech Recognition and Understanding (11-751/18-781)

Course Logistics

Instructor: Shinji Watanabe
TAs: Xuankai Chang, Yifan Peng, Jiatong Shi
Time: MW 4:40PM – 6:00PM
Location: GHC 4307
Discussion: Piazza

Grading

Grading policies
- Class Participation (30%)
- Assignments (30%)
- Mid-term exam (20%)
- Term Project (20%)
We will use gradescope

Syllabus

This is a tentative schedule.
The slides will be uploaded right before the lecture.
The vidoes will be uploaded irregulaly after the lecture due to the edit process.

Date	Lecture	Topics	Slides/Videos
8/29	Course overview	Course explanation and introduction	https://youtu.be/DsYDmg72K1k
8/31	Introduction of speech recognition	- Evaluation metric - How to transcribe speech - Databases	https://youtu.be/HCNqxmOwEH4
9/7	Overview of speech recognition systems	- HMM-based systems vs. End-to-End systems - Output Units - Pronunciation lexicon	No video due to technical issues.
9/12	Speech recognition formulations	- Probabilistic rules - From Bayes decision theory to HMM + n-gram, CTC, RNN-T, and attention	https://youtu.be/9QPiMoJJAXg
9/14	Feature extraction	- Basic pipeline - Some advances in feature extractions	https://youtu.be/isABhD2ym80
9/19	ESPnet hands-on tutorial I	- Introduction of toolkit - How to make a new recipe	https://youtu.be/YDN8cVjxSik
9/21	ESPnet hands-on tutorial II	- How to make a new task	https://youtu.be/Css3XAes7SU
9/26	Alignment	- 3 state left-to-right HMM - CTC - Transducer	https://youtu.be/ZFvtCaXs3aA
9/28	Hidden Markov model (Part I)	- Emission probability - Single Gaussian model - Gaussian mixture model - Expectation Maximization Algorithm	https://youtu.be/hJi5quunTLY
10/3	Hidden Markov model (Part II)	Hidden Markov model with Expectation Maximization Algorithm	https://youtu.be/6k7q9ggIfYI
10/5	Hidden Markov model (Part III)	- Baum-Welch algorithm - Viterbi algorithm	https://youtu.be/YmRnIphseyw
10/10	Advanced acoustic modelig	- Phonetic decision tree - Adaptation	https://youtu.be/GTaqSmQSHBs
10/12	Language model	N-gram language model	https://youtu.be/VqySbRgHlPc
10/24	Deep learning for speech recognition	- Introduction - Deep neural networks for acoustic modeling	https://youtu.be/IWvCFd91JPg
10/26	Mid-term exam
10/31	Advanced neural network architectures for acoustic model	- Convolutional neural networks - Recurrent neural networks - Self-attention	https://youtu.be/3YTuHQfaLgA
11/2	Neural network language model		https://youtu.be/uRk79NJD1cA
11/7	End-to-end ASR: Attention		https://youtu.be/6955aj5hlwk
11/9	End-to-end ASR: CTC		https://youtu.be/X2Jjx1icXsE
11/14	End-to-end ASR: RNN-T		https://youtu.be/lVc46-aBnzM
11/16	Advanced topics on end-to-end ASR	- Data augmentation - Joint CTC/attention - Transformer - Streaming	https://youtu.be/S2rSm11lX80
11/21	Search	- Time-synchronous beam search - Label-synchronous beam search - N-best and lattice - Rescoring	https://youtu.be/GcfIdxj1s8M
11/28	Guest Lecture, Zhong-Qiu Wang	- Robust training method - Speech enhancement and separation - Mltichannel processing	No Video
11/30	Guest lecture, Thomas Shaf at 3M \| MModal		No Video
12/5	Project Event
12/7	Project event

Assignments

Will be announced during the course