Speech Processing (11-492/11-692/18-495)
Course Logistics
- Instructor: Shinji Watanabe
- TAs: Jiatong Shi, Siddhant Arora
- Time: MW 3:30PM – 4:50PM
- Location: GHC 4211
- Discussion: Piazza
Grading
- Grading policies
- Student presentation
- Assignments
- Term Project
- We will use gradescope
Syllabus
- This is a tentative schedule.
- The slides will be uploaded right before the lecture (in piazza).
- The vidoes will be uploaded irregulaly after the lecture due to the edit process (in piazza).
Date | Lecture | Topics | Slides/Videos |
---|---|---|---|
1/16 | Course overview | Course explanation and introduction | |
1/23 | Speech processing overview | ||
1/25 | Speech recognition part I | ||
1/30 | ESPnet tutorial I | ||
2/1 | ESPnet tutorial II | ||
2/6 | Speech recognition part II | ||
2/8 | SSL models for speech recognition | ||
2/13 | Speaker Recognition | ||
2/15 | Speaker Diarization | ||
2/20 | Language model | ||
2/22 | Database, Data preparation | ||
2/27 | Multi-speaker ASR | ||
3/1 | Midterm project event | ||
3/13 | Multilingual speech recognition | ||
3/15 | Speech translation | ||
3/20 | Speech/audio classification | ||
3/22 | Spoken language understanding | ||
3/27 | Single-channel speech enhancement | ||
3/29 | Multi-channel speech enhancement | ||
4/3 | Text to speech (text2mel) | ||
4/5 | Text to speech (vocoder, joint model) | ||
4/10 | System I: speech-to-speech translation | ||
4/12 | System II: spoken dialog system | ||
4/17 | Guest lecture | ||
4/19 | Guest lecture | ||
4/24 | Project event I | ||
4/26 | Project event II |
Assignments
Will be announced during the course