Course Logistics

  • Instructor: Shinji Watanabe
  • TAs: Xuankai Chang, Yifan Peng, Brian Yan
  • Time: MW 3:30PM – 4:50PM
  • Location: GHC 4307
  • Discussion: Piazza

Grading

  • Grading policies
    • Class Participation (25%)
    • Assignments (30%)
    • Mid-term exam (20%)
    • Term Project (25%)
  • We will use gradescope

Syllabus

  • This is a tentative schedule.
  • The slides will be uploaded right before the lecture.
  • The vidoes will be uploaded irregulaly after the lecture due to the edit process.
Date Lecture Topics Slides/Videos
8/28 Course overview Course explanation and introduction
8/30 Introduction of speech recognition - Evaluation metric
- How to transcribe speech
- Databases
9/6 Speech recognition formulations - Probabilistic rules
- From Bayes decision theory to HMM + n-gram, CTC, RNN-T, and attention
9/11 Feature extraction - Basic pipeline
- Some advances in feature extractions
9/13 Acoustic model overview
9/18 Alignment problems - 3 state left-to-right HMM
- CTC
- Transducer
9/20 K-means, GMM, EM algorithm
9/25 Forward-backward algorithm for HMM
9/27 Forward-backward algorithm for HMM
10/2 Forward-backward algorithm for CTC and Viterbi algorithm
10/2 N-gram language modelm
10/9 Midterm exam
10/11 Search - Time-synchronous beam search
- Label-synchronous beam search
- N-best and lattice
- Rescoring
10/23 ESPnet hands-on tutorial I - Introduction of toolkit
- How to make a new recipe
10/25 ESPnet hands-on tutorial II - How to make a new task
10/30 Deep neural network for acoustic modeling
11/1 Neural network language model
11/6 End-to-End ASR: Attention
11/8 End-to-End ASR: CTC
11/13 End-to-End ASR: RNN-T
11/15 Advanced topics on end-to-end ASR I
11/20 Advanced topics on end-to-end ASR II
11/27 Guest Lecture
11/29 Guest Lecture
12/4 Project Event
12/6 Project Event

Assignments

Will be announced during the course