WAVLab

affiliated with @ LTI/CMU.

This is Watanabe’s Audio and Voice (WAV) Lab at the Language Technologies Institute of Carnegie Mellon University. Our research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing.

The end-of-semester presentation, 05.06.2024

selected publications

  1. SD CSL
    A review of speaker diarization: Recent advances with deep learning
    Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J Han, Shinji Watanabe, and Shrikanth Narayanan
    Computer Speech & Language 2022
  2. ASR&SD&SLU&ER Interspeech
    SUPERB: Speech processing Universal PERformance Benchmark
    Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Lai, Kushal Lakhotia, Yist Y., Andy T., Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee
    In Proceedings of Interspeech 2021
  3. TTS ICASSP
    Espnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit
    Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, and Xu Tan
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
  4. ST ACL
    ESPnet-ST: All-in-One Speech Translation Toolkit
    Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Yalta, Tomoki Hayashi, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2020
  5. SD Interspeech
    End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors
    Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, and Kenji Nagamatsu
    2020
  6. SD ASRU
    End-to-end neural speaker diarization with self-attention
    Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
  7. SD Interspeech
    End-to-End Neural Speaker Diarization with Permutation-Free Objectives
    Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, and Shinji Watanabe
    In Proceedings of Interspeech 2019
  8. ASR Interspeech
    Improving Transformer Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration
    Shigeki Karita, Nelson Yalta, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa, and Tomohiro Nakatani
    In Proceedings of Interspeech 2019
  9. ASR Interspeech
    ESPnet: End-to-End Speech Processing Toolkit
    Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai
    Proceedings of Interspeech 2018
  10. SE&ASR Interspeech
    The Fifth ’CHiME’ Speech Separation and Recognition Challenge: Dataset, Task and Baselines
    Jon Barker, Shinji Watanabe, Emmanuel Vincent, and Jan Trmal
    Proceedings of Interspeech 2018
  11. SD Interspeech
    Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.
    Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe, and others
    In Interspeech 2018