WAVLab

affiliated with @ LTI/CMU.

This is Watanabe’s Audio and Voice (WAV) Lab at the Language Technologies Institute of Carnegie Mellon University. Our research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing.

The end-of-semester presentation, 05.07.2025

selected publications

  1. ASR Interspeech
    OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
    Yifan Peng, Shakeel Muhammad, Yui Sudo, William Chen, Jinchuan Tian, Chyi-Jiunn Lin, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  2. SSL EMNLP
    Towards Robust Speech Representation Learning for Thousands of Languages
    William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, and Shinji Watanabe
    In Proceedings of EMNLP 2024
  3. ASR Interspeech
    EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
    Tejes Srivastava, Jiatong Shi, William Chen, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  4. SS TASLP
    TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
    Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
  5. ASR TASLP
    End-to-End Speech Recognition: A Survey
    Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, and Shinji Watanabe
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
  6. ASR&SSL Interspeech
    ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
    Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  7. ASR&SLU&MT ICML
    Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
    In Proceedings of the International Conference on Machine Learning (ICML) 2022
  8. SD CSL
    A review of speaker diarization: Recent advances with deep learning
    Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J Han, Shinji Watanabe, and Shrikanth Narayanan
    Computer Speech & Language 2022
  9. SE ICASSP
    CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT
    Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, and Yu Tsao
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  10. SLU ICASSP
    ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET
    Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  11. ASR&SD&SLU&ER Interspeech
    SUPERB: Speech processing Universal PERformance Benchmark
    Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Lai, Kushal Lakhotia, Yist Y., Andy T., Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee
    In Proceedings of Interspeech 2021