Course Logistics

  • Instructor: Shinji Watanabe
  • TAs: Xuankai Chang, Yifan Peng, Jiatong Shi
  • Time: MW 4:40PM – 6:00PM
  • Location: GHC 4307
  • Discussion: Piazza

Grading

  • Grading policies
    • Class Participation (30%)
    • Assignments (30%)
    • Mid-term exam (20%)
    • Term Project (20%)
  • We will use gradescope

Syllabus

  • This is a tentative schedule.
  • The slides will be uploaded right before the lecture.
  • The vidoes will be uploaded irregulaly after the lecture due to the edit process.
Date Lecture Topics Slides/Videos
8/29 Course overview Course explanation and introduction https://youtu.be/DsYDmg72K1k
8/31 Introduction of speech recognition - Evaluation metric
- How to transcribe speech
- Databases
https://youtu.be/HCNqxmOwEH4
9/7 Overview of speech recognition systems - HMM-based systems vs. End-to-End systems
- Output Units
- Pronunciation lexicon
No video due to technical issues.
9/12 Speech recognition formulations - Probabilistic rules
- From Bayes decision theory to HMM + n-gram, CTC, RNN-T, and attention
https://youtu.be/9QPiMoJJAXg
9/14 Feature extraction - Basic pipeline
- Some advances in feature extractions
https://youtu.be/isABhD2ym80
9/19 ESPnet hands-on tutorial I - Introduction of toolkit
- How to make a new recipe
https://youtu.be/YDN8cVjxSik
9/21 ESPnet hands-on tutorial II - How to make a new task https://youtu.be/Css3XAes7SU
9/26 Alignment - 3 state left-to-right HMM
- CTC
- Transducer
https://youtu.be/ZFvtCaXs3aA
9/28 Hidden Markov model (Part I) - Emission probability
- Single Gaussian model
- Gaussian mixture model
  - Expectation Maximization Algorithm
https://youtu.be/hJi5quunTLY
10/3 Hidden Markov model (Part II) Hidden Markov model with Expectation Maximization Algorithm https://youtu.be/6k7q9ggIfYI
10/5 Hidden Markov model (Part III) - Baum-Welch algorithm
- Viterbi algorithm
https://youtu.be/YmRnIphseyw
10/10 Advanced acoustic modelig - Phonetic decision tree
- Adaptation
https://youtu.be/GTaqSmQSHBs
10/12 Language model N-gram language model https://youtu.be/VqySbRgHlPc
10/24 Deep learning for speech recognition - Introduction
- Deep neural networks for acoustic modeling
https://youtu.be/IWvCFd91JPg
10/26 Mid-term exam
10/31 Advanced neural network architectures for acoustic model - Convolutional neural networks
- Recurrent neural networks
- Self-attention
https://youtu.be/3YTuHQfaLgA
11/2 Neural network language model https://youtu.be/uRk79NJD1cA
11/7 End-to-end ASR: Attention https://youtu.be/6955aj5hlwk
11/9 End-to-end ASR: CTC https://youtu.be/X2Jjx1icXsE
11/14 End-to-end ASR: RNN-T https://youtu.be/lVc46-aBnzM
11/16 Advanced topics on end-to-end ASR - Data augmentation
- Joint CTC/attention
- Transformer
- Streaming
https://youtu.be/S2rSm11lX80
11/21 Search - Time-synchronous beam search
- Label-synchronous beam search
- N-best and lattice
- Rescoring
https://youtu.be/GcfIdxj1s8M
11/28 Guest Lecture, Zhong-Qiu Wang - Robust training method
- Speech enhancement and separation
- Mltichannel processing
No Video
11/30 Guest lecture, Thomas Shaf at 3M | MModal No Video
12/5 Project Event
12/7 Project event

Assignments

Will be announced during the course