WAVLab

This is Watanabe’s Audio and Voice (WAV) Lab at the Language Technologies Institute of Carnegie Mellon University. Our research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing.

The end-of-semester presentation, 05.07.2025

selected publications

ASR Interspeech

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Yifan Peng, Shakeel Muhammad, Yui Sudo, William Chen, Jinchuan Tian, Chyi-Jiunn Lin, and Shinji Watanabe

In Proceedings of Interspeech 2025
SSL EMNLP

Towards Robust Speech Representation Learning for Thousands of Languages

William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, and Shinji Watanabe

In Proceedings of EMNLP 2024
ASR Interspeech

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios

Tejes Srivastava, Jiatong Shi, William Chen, and Shinji Watanabe

In Proceedings of Interspeech 2024
SS TASLP

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
ASR TASLP

End-to-End Speech Recognition: A Survey

Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, and Shinji Watanabe

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
ASR&SSL Interspeech

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR&SLU&MT ICML

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Yifan Peng, Siddharth Dalmia, Ian Lane, and Shinji Watanabe

In Proceedings of the International Conference on Machine Learning (ICML) 2022
SD CSL

A review of speaker diarization: Recent advances with deep learning

Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J Han, Shinji Watanabe, and Shrikanth Narayanan

Computer Speech & Language 2022
SE ICASSP

CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT

Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, and Yu Tsao

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
SLU ICASSP

ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET

Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR&SD&SLU&ER Interspeech

SUPERB: Speech processing Universal PERformance Benchmark

Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Lai, Kushal Lakhotia, Yist Y., Andy T., Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee

In Proceedings of Interspeech 2021

arXiv HTML PDF