Publications

2024

  1. SSL EMNLP
    Towards Robust Speech Representation Learning for Thousands of Languages
    William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, and Shinji Watanabe
    In Proceedings of EMNLP 2024
  2. ASR&ER&Speaker SLT
    Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
    Chao-Han Huck Yang, Tae Jin Park, Yuan Gong, Yuanchao Li, Yen-Ting Lin, Zhehuai Chen, Yuchen Hu, Chen Chen, Kunal Dhawan, Piotr Żelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, and Andreas Stolcke
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  3. Tokenizer SLT
    Codec-SUPERB \@SLT 2024: A lightweight benchmark for neural codec models
    Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Jiawei Du, Kai-Wei Chang, Ke-Han Lu, Alexander Liu, Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, and Hung-yi Lee
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  4. Music SLT
    VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
    Yifeng Yu, Jiatong Shi, Yuning Wu, Yuxun Tang, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  5. Tokenizer SLT
    ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech
    Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander Liu, Bhiksha Raj, Qin Jin, Ruihua Song, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  6. ASR SLT
    FLORAS 50: A Massively Multilingual Multitask Benchmark for Long-form Conversational Speech
    William Chen, Brian Yan, Chih-Chen Chen, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  7. ASR SLT
    Contextualized Automatic Speech Recognition with Dynamic Vocabulary
    Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  8. ASR SLT
    Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
    Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  9. SE SLT
    Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
    Chenda Li, Samuele Cornell, Shinji Watanabe, and Yanmin Qian
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  10. ASR SLT
    Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
    Shih-Heng Wang, Jiatong Shi, Chien-yu Huang, Shinji Watanabe, and Hung-yi Lee
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  11. ASR&TTS SLT
    ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
    Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  12. ASR Interspeech
    Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
    Darshan Prabhu, Yifan Peng, Preethi Jyothi, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  13. ASR Interspeech
    Self-training ASR Guided by Unsupervised ASR Teacher
    Hyung Yong Kim, Byeong-Yeol Kim, Yunkyu Lim, Jihwan Park, Shukjae Choi, Yooncheol Ju, Jinseok Park, Youshin Lim, Seung Woo Yu, Hanbin Lee, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  14. Tokenizer Interspeech
    MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
    Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  15. ASR Interspeech
    ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
    Jiatong Shi, Shi-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  16. ASR Interspeech
    EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
    Tejes Srivastava, Jiatong Shi, William Chen, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  17. ASR Interspeech
    Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition
    Kwangyoun Kim, Suwon Shon, Yi-Te Hsu, Prashant Sridhar, Karen Livescu, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  18. ASR Interspeech
    On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models
    Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  19. SLU Interspeech
    Towards Unified Evaluation of Continual Learning in Spoken Language Understanding
    Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, and Bhiksha Raj
    In Proceedings of Interspeech 2024
  20. ASR&TTS&Music Interspeech
    The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
    Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, and Qin Jin
    In Proceedings of Interspeech 2024
  21. Evaluation Interspeech
    SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
    Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, and Hiroshi Saruwatari
    In Proceedings of Interspeech 2024
  22. Speaker Interspeech
    To what extent can ASV systems naturally defend against spoofing attacks?
    Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Siddhant Arora, Junichi Yamagishi, and Joon Son Chung
    In Proceedings of Interspeech 2024
  23. Speaker Interspeech
    ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
    Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Alex Gichamba, Barry-John Theobald, Ahmed Hussen Abdelaziz, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  24. SLU Interspeech
    DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
    Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, and Karen Livescu
    In Proceedings of Interspeech 2024
  25. SE Interspeech
    Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
    Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, and Yanmin Qian
    In Proceedings of Interspeech 2024
  26. ASR Interspeech
    Contextualized End-to-End Automatic Speech Recognition with Intermediate Biasing Loss
    Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  27. SE Interspeech
    URGENT Challenge: Universality, Robustness, and Generalizability for speech EnhancemeNT
    Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, and Yanmin Qian
    In Proceedings of Interspeech 2024
  28. Speaker Interspeech
    Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
    Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, and Barry-John Theobald
    In Proceedings of Interspeech 2024
  29. ASR Interspeech
    OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
    Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  30. SSL Interspeech
    Self-Supervised Speech Representations are More Phonetic than Semantic
    Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  31. SS Interspeech
    Neural Blind Source Separation and Diarization for Distant Speech Recognition
    Yoshiaki Bando, Tomohiko Nakamura, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  32. SLU Interspeech
    Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
    Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  33. ASR Interspeech
    Decoder-only Architecture for Streaming End-to-end Speech Recognition
    Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  34. ASR Interspeech
    Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
    Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  35. SE Interspeech
    EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
    Julius Richter, Yi-Chiao Wu, Steven Krenn, Alexander Richard, Simon Welker, Bunlong Lay, Shinji Watanabe, and Timo Gerkmann
    In Proceedings of Interspeech 2024
  36. Music Interspeech
    Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
    Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  37. ASR ACL
    Wav2Gloss: Generating Interlinear Glossed Text from Speech
    Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel Romney Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R Mortensen, and Lori Levin
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
  38. ASR ACL
    OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
    Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
  39. SLU ACL
    On the Evaluation of Speech Foundation Models for Spoken Language Understanding
    Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
  40. SS IJCAI
    Cross-Talk Reduction
    Zhong-Qiu Wang, Anurag Kumar, and Shinji Watanabe
    In Proceedings of IJCAI 2024
  41. SLU NAACL
    UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions
    Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, and Shinji Watanabe
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2024
  42. TTS TASLP
    Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis
    Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, and Hiroshi Saruwatari
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
  43. Music TASLP
    Music ControlNet: Multiple Time-varying Controls for Music Generation
    Shih-Lun Wu, Chris Donahue, Shinji Watanabe, and Nicholas J. Bryan
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
  44. ASR SPL
    MC-Whisper: Improving Distant Speech Recognition by Extending Large Pre-Trained Model to Multi-channel
    Xuankai Chang, Pengcheng Guo, Yuya Fujita, Takashi Maekaku, and Shinji Watanabe
    IEEE Signal Processing Letters 2024
  45. SE ICASSP
    The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
    Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, and Jianqing Gao
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  46. Audio ICASSP
    Improving Continual Learning of Acoustic Scene Classification via Mutual Information Optimization
    Muqiao Yang, Umberto Cappellazzo, Xiang Li, Shinji Watanabe, and Bhiksha Raj
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  47. ASR ICASSP
    Improving ASR Contextual Biasing with Guided Attention
    Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  48. SLU ICASSP
    AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models
    Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  49. ASR&TTS ICASSP
    Voxtlm: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks
    Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  50. ASR ICASSP
    Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora
    Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  51. ST ICASSP
    Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
    Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  52. ASR ICASSP
    Phisanet: Phonetically Informed Speech Animation Network
    Salvador Medina, Sarah Taylor, Carsten Stoll, Gareth Edwards, Alex Hauptmann, Shinji Watanabe, and Iain Matthews
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  53. SD&ASR ICASSP
    One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition
    Samuele Cornell, Jee-weon Jung, Shinji Watanabe, and Stefano Squartini
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  54. ASR ICASSP
    Less Peaky and More Accurate CTC Forced Alignment by Pruned CTC Loss and Label Priors
    Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Shinji Watanabe, Daniel Povey, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  55. SSL ICASSP
    HuberTopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model
    Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  56. ASR&ST&SLU ICASSP
    Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
    Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-weon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, and Hsiu-Hsuan Wang
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  57. LLM&SLU ICASSP
    Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
    Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chun-Yi Kuan, Chi-Yuan Hsiao, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, and Hung-yi Lee
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  58. ST ICASSP
    Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
    Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  59. ASR ICASSP
    Semi-Autoregressive Streaming ASR with Label Context
    Siddhant Arora, George Saon, Shinji Watanabe, and Brian Kingsbury
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  60. SSL ICASSP
    Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models
    Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, and Karen Livescu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  61. ASR ICASSP
    Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
    Yui Sudo, Shakeel Muhammad, Yosuke Fukumoto, Yifan Peng, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  62. SSL ICASSP
    Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing
    William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  63. SE ICASSP
    Improving Design of Input Condition Invariant Speech Enhancement
    Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, and Yanmin Qian
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  64. ASR ICASSP
    Phoneme-Aware Encoding for Prefix-Tree-Based Contextual ASR
    Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  65. SS ICASSP
    Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor
    Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  66. ASR ICASSP
    Visual Speech Recognition for Low-Resource Languages with Automatic Labels from Whisper Model
    Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, and Yong Man Ro
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  67. Caption ICASSP
    Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens
    Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, and Yong Man Ro
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  68. SSL ICASSP
    Understanding Probe Behaviors Through Variational Bounds of Mutual Information
    Kwanghee Choi, Jee-weon Jung, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  69. Caption ICASSP
    Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation
    Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  70. SSL ICASSP
    AV-Superb: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
    Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi Luen Feng, and Hung-yi Lee
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

2023

  1. ASR ASRU
    Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
    Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Lu-Tshiann Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, and Jiatong Shi
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  2. SVC ASRU
    The Singing Voice Conversion Challenge 2023
    Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, and Tomoki Toda
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  3. ASR ASRU
    Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition
    Yusuke Shinohara, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  4. Summarization&ST ASRU
    Summarize while Translating: Universal Model with Parallel Decoding for Summarization and Translation
    Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  5. ASR ASRU
    YODAS: Youtube-Oriented Dataset for Audio and Speech
    Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  6. SE&SS ASRU
    A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction
    Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, and Tetsuji Ogawa
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  7. ASR&SSL ASRU
    TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
    Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, and Yumeng Tao
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  8. SE ASRU
    Toward Universal Speech Enhancement For Diverse Input Conditions
    Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, and Yanmin Qian
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  9. ASR ASRU
    Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
    Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei Ping Huang, En Pei Hu, Chung, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  10. SSL ASRU
    Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning
    William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  11. ASR ASRU
    Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
    Masao Someki, Nicholas Eng, Yosuke Higuchi, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  12. ASR&ST ASRU
    Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
    Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  13. Summarization ASRU
    ESPNet-SUMM: Introducing a novel large dataset, toolkit, and a cross-corpora evaluation of speech summarization systems
    Roshan Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Atsunori Ogawa, Siddhant Arora, Marc Delcroix, Rita Singh, Shinji Watanabe, and Bhiksha Raj
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  14. ASR ASRU
    LV-CTC: Non-autoregressive ASR with CTC and latent variable models
    Yuya Fujita, Shinji Watanabe, Xuankai Chang, and Takashi Maekaku
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
  15. SS NeurIPS
    UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures
    Zhong-Qiu Wang, and Shinji Watanabe
    In Proceedings of the Conference on Neural Information Processing Systems 2023
  16. SS WASPAA
    Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
    Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, and Shinji Watanabe
    In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023
  17. SS CSL
    Dilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training Data
    Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
  18. SD TASLP
    Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors
    Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, and Yohei Kawaguchi
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
  19. MT&ASR TASLP
    LegoNN: Building Modular Encoder-Decoder Models
    Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, and Abdelrahman Mohamed
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
  20. ST ACL(demo)
    ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
    Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
  21. ST ACL
    UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
    Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, and Juan Pino
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
  22. ASR ICML
    Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
    Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, and Boris Ginsburg
    In Proceedings of the International Conference on Machine Learning (ICML) 2023
  23. TTS IJCAI
    Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
    Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, and Hiroshi Saruwatari
    In IJCAI 2023
  24. TTS Interspeech
    Deep Speech Synthesis from MRI-Based Articulatory Representations
    Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan Black, Louis Goldstein, Shinji Watanabe, and Gopala Krishna Anumanchipalli
    In Proceedings of Interspeech 2023
  25. ASR Interspeech
    A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
    Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, and Brian MacWhinney
    In Proceedings of Interspeech 2023
  26. ASR&SSL Interspeech
    Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
    Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  27. ASR Interspeech
    Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
    Puyuan Peng, Brian Yan, Shinji Watanabe, and David Harwath
    In Proceedings of Interspeech 2023
  28. ASR&SLU Interspeech
    Integrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language Understanding
    Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  29. ASR Interspeech
    Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder–decoder Speech Recognition
    Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  30. ASR Interspeech
    Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
    Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng Yu, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  31. SSL Interspeech
    Exploration on HuBERT with Multiple Resolution
    Jiatong Shi, Yun Tang, HIrofumi Inaguma, Hongyu Gong, Juan Pino, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  32. ASR Interspeech
    Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training
    Yui Sudo, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  33. ASR&SSL Interspeech
    ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
    In Proceedings of Interspeech 2023
  34. SLU Interspeech
    Tensor Decomposition for Minimization of E2E SLU Model Toward On-Device Processing
    Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  35. ASR Interspeech
    4D: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
    Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  36. SSL Interspeech
    DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
    Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  37. ASR&ST Interspeech
    lA Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
    Yifan Peng Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  38. SSL Interspeech
    Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute
    William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, and Shinji Watanabe
    In Proceedings of Interspeech 2023
  39. Summarization Interspeech
    BASS: Block-wise Adaptation for Speech Summarization
    Roshan Sharma, Siddhant Arora, Kenneth Zheng, Shinji Watanabe, Rita Singh, and Bhiksha Raj
    In Proceedings of Interspeech 2023
  40. ST EACL
    CTC Alignments Improve Autoregressive Translation
    Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, and Shinji Watanabe
    In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2023
  41. ASR ICLR
    Continuous Pseudo-Labeling from the Start
    Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, and Tatiana Likhomanenko
    In Proceedings of the International Conference on Learning Representations (ICLR) 2023
  42. ASR ICASSP
    Multi-blank Transducers for Speech Recognition
    Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, and Boris Ginsburg
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  43. SE ICASSP
    PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement
    Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  44. SD ICASSP
    In search of strong embedding extractors for speaker diarisation
    Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, and Joon Son Chung
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  45. ASR ICASSP
    Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
    Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu J. Han, Ryan McDonald, Kilian Q. Weinberger, and Yoav Artzi
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  46. ASR ICASSP
    BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
    Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  47. ASR ICASSP
    InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
    Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  48. TTS&SSL ICASSP
    A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units
    Li-Wei Chen, Shinji Watanabe, and Alexander Rudnicky
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  49. SE ICASSP
    TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
    Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  50. SSL&SLU ICASSP
    Bridging Speech and Text Pre-trained Models with Unsupervised ASR
    Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, and Hung-yi Lee
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  51. Music ICASSP
    PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor
    Yuning Wu, Jiatong Shi, Tao Qian, and Qin Jin
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  52. SLU ICASSP
    Speech summarization of long spoken document: Improving memory efficiency of speech/text encoders
    Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  53. SSL ICASSP
    Context-Aware Fine-Tuning of Self-Supervised Speech Models
    Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  54. S2ST ICASSP
    Enhancing Speech-To-Speech Translation with Multiple TTS Targets
    Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  55. ASR ICASSP
    Streaming Joint Speech Recognition and Disfluency Detection
    Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  56. ASR ICASSP
    Towards Zero-Shot Code-Switched Speech Recognition
    Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  57. ST ICASSP
    Align and Write and Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
    Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  58. ASR ICASSP
    Improving Massively Multilingual ASR With Auxiliary CTC Objectives
    William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  59. SSL ICASSP
    SpeechLMScore: Evaluating Speech Generation Using Speech Language Model
    Soumi Maiti, Yifan Peng, Takaaki Saeki, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  60. TTS ICASSP
    Speaker-Independent Acoustic-to-Articulatory Speech Inversion
    Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W. Black, and Gopala K. Anumanchipalli
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  61. SLU ICASSP
    Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History
    Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  62. SS ICASSP
    TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
    Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  63. ASR&SSL ICASSP
    EURO: ESPnet Unsupervised ASR Open-Source Toolkit
    Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  64. SE ICASSP
    Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
    Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  65. ASR&SSL ICASSP
    Avoid Overthinking in Self-Supervised Models for Speech Recognition
    Dan Berrebbi, Brian Yan, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  66. TTS ICASSP
    Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization
    Jiachen Lian, Alan W Black, Yijing Lu, Louis Goldstein, Shinji Watanabe, and Gopala K. Anumanchipalli
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  67. ASR&SLU&SSL ICASSP
    Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding
    Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  68. SSL ICASSP
    Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model
    Takashi Maekaku, Yuya Fujita, Xuankai Chang, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  69. MultiModal ICASSP
    The Multimodal Information Based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition
    Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, and Cong Liu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  70. SSL&ASR ICASSP
    FINDADAPTNET: Find and Insert Adapters by Learned Layer Importance
    Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  71. ASR ICASSP
    I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
    Yifan Peng, Jaesong Lee, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023

2022

  1. TTS AAAI
    A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
    Li-Wei Chen, Alexander Rudnicky, and Shinji Watanabe
    In Proceedings of AAAI 2022
  2. ASR EMNLP
    BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model
    Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe
    In Proceedings of Findings of EMNLP 2022
  3. SLU EMNLP
    Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
    Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, and Shinji Watanabe
    In Proceedings of Findings of EMNLP 2022
  4. SD TASLP
    Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors
    Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, and Yohei Kawaguchi
    In IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
  5. SE CSL
    A Dilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training Data
    Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur
    In Computer Speech & Language 2022
  6. SE TASLP
    End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail Party
    Wangyou Zhang, Xuankai Chang, Christoph Boeddeker, Tomohiro Nakatani, Shinji Watanabe, and Yanmin Qian
    In IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
  7. SE SPL
    Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction
    Zhong-Qiu Wang, and Shinji Watanabe
    In IEEE Signal Processing Letters 2022
  8. SD TASLP
    Encoder-Decoder Based Attractors for End-to-End Neural Diarization
    Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, and Paola Garcia
    In IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
  9. ASR JSTSP
    Self-Supervised Speech Representation Learning: A Review
    Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, and Shinji Watanabe
    In IEEE Journal of Selected Topics in Signal Processing 2022
  10. ST IWSLT
    Findings of the IWSLT 2022 Evaluation Campaign
    Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, and Shinji Watanabe
    In iwsltt 2022
  11. SD&SS SLT
    EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers
    Yushi Ueda, Soumi Maiti, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, and Yong Xu
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
  12. ASR&SD&SLU&ER SLT
    SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
    Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdel-rahman Mohamed, Shang-Wen Li, and Hung-yi Lee
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
  13. ASR SLT
    E-Branchformer: Branchformer with Enhanced merging for speech recognition
    Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
  14. ASR&SLU SLT
    A Study on the Integration of Pre-Trained SSL and ASR and LM and SLU Models for Spoken Language Understanding
    Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
  15. ASR&SSL SLT
    On Compressing Sequences for Self-Supervised Speech Models
    Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee, and Hao Tang
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
  16. ASR&SE&SSL SLT
    End-to-End Integration of Speech Recognition and Dereverberation and Beamforming and Self-Supervised Learning Representation
    Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, and Nobutaka Ono
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
  17. SE SLT
    Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization
    Shota Horiguchi, Yuki Takashima, Shinji Watanabe, and Paola Garcia
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
  18. ASR SLT
    End-to-End Multi-speaker ASR with Independent Vector Analysis
    Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, and Yanmin Qian
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
  19. ASR Interspeech
    VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
    Jiatong Shi, George Saon, David Haws, Shinji Watanabe, and Brian Kingsbury
    In Proceedings of Interspeech 2022
  20. ASR Interspeech
    Memory-Efficient Training of RNN-Transducer with Sampled Softmax
    Jaesong Lee, Lukas Lee, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  21. SLU&ST Interspeech
    Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
    Keqi Deng, Shinji Watanabe, Jiatong Shi, and Siddhant Arora
    In Proceedings of Interspeech 2022
  22. Music Interspeech
    SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy
    Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, and Qin Jin
    In Proceedings of Interspeech 2022
  23. Music Interspeech
    Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis
    Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, and Qin Jin
    In Proceedings of Interspeech 2022
  24. ASR Interspeech
    Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis
    Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Baocai Yin, and Jia Pan
    In Proceedings of Interspeech 2022
  25. KWS Interspeech
    Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis
    Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Shifu Xiong, and Jian-Qing Gao
    In Proceedings of Interspeech 2022
  26. ASR Interspeech
    ASR2K: Speech Recognition for Around 2000 Languages without Audio
    Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  27. SE Interspeech
    ESPnet-SE++: Speech Enhancement for Robust Speech Recognition and Translation and and Understanding
    Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  28. SLU Interspeech
    Two-Pass Low Latency End-to-End Spoken Language Understanding
    Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W Black, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  29. TTS Interspeech
    Deep Speech Synthesis from Articulatory Representations
    Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W Black, and Gopala Krishna Anumanchipalli
    In Proceedings of Interspeech 2022
  30. ASR Interspeech
    Minimum latency training of sequence transducers for streaming end-to-end speech recognition
    Yusuke Shinohara, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  31. ASR Interspeech
    Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection
    Yui Sudo, Shakeel Muhammad, Kazuhiro Nakadai, Jiatong Shi, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  32. ASR Interspeech
    Better Intermediates Improve CTC Inference
    Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, and Yusuke Kida
    In Proceedings of Interspeech 2022
  33. ASR Interspeech
    Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models
    Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Leibny Paola Garcia Perera, and Yohei Kawaguchi
    In Proceedings of Interspeech 2022
  34. ASR Interspeech
    Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
    Takashi Maekaku, Yuya Fujita, Yifan Peng, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  35. ASR Interspeech
    Residual Language Model for End-to-end Speech Recognition
    Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Prasad Narisetty, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  36. TTS Interspeech
    When Is TTS Augmentation Through a Pivot Language Useful?
    Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  37. TTS Interspeech
    TriniTTS: Pitch-controllable End-to-end TTS without External Aligner
    Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  38. ASR Interspeech
    Online Continual Learning of End-to-End Speech Recognition Models
    Muqiao Yang, Ian Lane, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  39. SE Interspeech
    Improving Speech Enhancement through Fine-Grained Speech Characteristics
    Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj
    In Proceedings of Interspeech 2022
  40. ASR&SE&SSL Interspeech
    End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
    Xuankai Chang, Takashi Maekaku, Yuya Fujita, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  41. ASR&SSL Interspeech
    Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
    Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan Amith, and Shinji Watanabe
    In Proceedings of Interspeech 2022
  42. ASR&SLU&MT ICML
    Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
    In Proceedings of the International Conference on Machine Learning (ICML) 2022
  43. Linguistic ACL
    Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble
    Xinjian Li, Florian Metze, David R Mortensen, Shinji Watanabe, and Alan Black
    In Proceedings of Findings of the Annual Meeting of the Association for Computational Linguistics 2022
  44. SE&VC&ST ACL
    SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
    Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2022
  45. SE&ASR CSL
    Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition
    Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu
    Computer Speech & Language 2022
  46. SD CSL
    A review of speaker diarization: Recent advances with deep learning
    Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J Han, Shinji Watanabe, and Shrikanth Narayanan
    Computer Speech & Language 2022
  47. SE&ASR CSL
    Joint speaker diarization and speech recognition based on region proposal networks
    Zili Huang, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj, and Sanjeev Khudanpur
    Computer Speech & Language 2022
  48. ASR CSL
    Arabic speech recognition by end-to-end, modular systems and human
    Amir Hussein, Shinji Watanabe, and Ahmed Ali
    Computer Speech & Language 2022
  49. ASR ICASSP
    TOWARDS LOW-DISTORTION MULTI-CHANNEL SPEECH ENHANCEMENT: THE ESPNET-SE SUBMISSION TO THE L3DAS22 CHALLENGE
    Jen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  50. Multimodal ICASSP
    THE FIRST MULTIMODAL INFORMATION BASED SPEECH PROCESSING (MISP) CHALLENGE: DATA, TASKS, BASELINES AND RESULTS
    Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Di-Yuan Liu, Bao-Cai Yin, Jia Pan, Jian-Qing Gao, and Cong Liu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  51. ASR ICASSP
    NON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSING
    Motoi Omachi, Yuya Fujita, Shinji Watanabe, and Tianzi Wang
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  52. ASR ICASSP
    AN EXPLORATION OF HUBERT WITH LARGE NUMBER OF CLUSTER UNITS AND MODEL ASSESSMENT USING BAYESIAN INFORMATION CRITERION
    Takashi Maekaku, Xuankai Chang, Yuya Fujita, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  53. SE&SSL ICASSP
    INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION
    Zili Huang, Shinji Watanabe, Shu-wen Yang, Paola Garcia, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  54. SE ICASSP
    CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT
    Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, and Yu Tsao
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  55. ASR ICASSP
    IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS
    Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, and Pengyuan Zhang
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  56. ASR ICASSP
    SRU++: PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION
    Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Han, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  57. ASR ICASSP
    Integrating multiple ASR systems into NLP backend with attention fusion
    Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  58. SLU ICASSP
    ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET
    Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  59. ASR ICASSP
    JOINT MODELING OF CODE-SWITCHED AND MONOLINGUAL ASR VIA CONDITIONAL FACTORIZATION
    Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, and Dong Yu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  60. ASR ICASSP
    EXTENDED GRAPH TEMPORAL CLASSIFICATION FOR MULTI-SPEAKER END-TO-END ASR
    Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, and Jonathan Le Roux
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  61. ASR ICASSP
    Sequence Transduction with Graph-based Supervision
    Niko Moritz, Takaaki Hori, Shinji Watanabe, and Jonathan Le Roux
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  62. ASR ICASSP
    RUN-AND-BACK STITCH SEARCH: NOVEL BLOCK SYNCHRONOUS DECODING FOR STREAMING ENCODER-DECODER ASR
    Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  63. VC&SSL ICASSP
    S3PRL-VC: OPEN-SOURCE VOICE CONVERSION FRAMEWORK WITH SELF-SUPERVISED SPEECH REPRESENTATIONS
    Wen-Chin Huang, Shu-wen Yang, Tomoki Hayashi, Hung-yi Lee, Shinji Watanabe, and Tomoki Toda
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  64. ASR ICASSP
    JOINT SPEECH RECOGNITION AND AUDIO CAPTIONING
    Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  65. SD ICASSP
    MULTI-CHANNEL END-TO-END NEURAL DIARIZATION WITH DISTRIBUTED MICROPHONES
    Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, and Yohei Kawaguchi
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  66. ASR ICASSP
    TORCHAUDIO: BUILDING BLOCKS FOR AUDIO AND SPEECH PROCESSING
    Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, and Vincent Quenneville-Bélair
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  67. SD ICASSP
    Towards End-to-End Speaker Diarization with Generalized Neural Speaker Clustering
    Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu, and Dong Yu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  68. Music ICASSP
    TRAINING STRATEGIES FOR AUTOMATIC SONG WRITING: A UNIFIED FRAMEWORK PERSPECTIVE
    Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, and Qin Jin
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
  69. SE+ASR CSL
    An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer
    Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu
    Computer Speech & Language 2022
  70. SE ICASSP
    Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge
    Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022

2021

  1. ASR+TTS ASRU
    On Prosody Modeling for ASR+TTS based Voice Conversion
    Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, and Tomoki Toda
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  2. SLU ASRU
    Attention-based Multi-hypothesis Fusion for Speech Summarization
    Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  3. ST ASRU
    Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
    Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  4. SD ASRU
    Towards Neural Diarization for Unlimited Numbers of Speakers using Global and Local Attractors
    Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, and Yohei Kawaguchi
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  5. ASR ASRU
    A Study of Transducer based End-to-end ASR with ESPNet: Architecture, Auxiliary Loss and Decoding Strategies
    Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  6. ASR ASRU
    A Comparative Study on Non-autoregressive Modelings for Speech-to-text Generation
    Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  7. SE ASRU
    ConferencingSpeech Challenge: Towards Far-field Multi-channel Speech Enhancement for Video Conferencing
    Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, and Shidong Shang
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  8. ASR+TTS ASRU
    Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
    Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, and Alan Black
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  9. ASR&SSL ASRU
    An Exploration of Self-supervised Pretrained Representations for End-to-end Speech Recognition
    Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-yi Lee, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  10. VC APSIPA
    Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks
    Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, and Louis-Philippe Morency
    In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2021
  11. ST IWSLT
    ESPnet-ST IWSLT 2021 Offline Speech Translation System
    Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, and Shinji Watanabe
    In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT) 2021
  12. ASR Interspeech
    GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
    Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, and Zhiyong Yan
    In Proceedings of Interspeech 2021
  13. AED Interspeech
    Acoustic Event Detection with Classifier Chains
    Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, and Tomoki Hayashi
    In Proceedings of Interspeech 2021
  14. ASR Interspeech
    Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain
    Pengcheng Guo, Xuankai Chang, Shinji Watanabe, and Lei Xie
    In Proceedings of Interspeech 2021
  15. ASR Interspeech
    Multi-mode Transformer Transducer with Stochastic Future Context
    Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Han, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  16. ASR Interspeech
    Differentiable Allophone Graphs for Language Universal Speech Recognition
    Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  17. SE Interspeech
    Speaker Verification-Based Evaluation of Single-Channel Speech Separation
    Matthew Maciejewski, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of Interspeech 2021
  18. ASR Interspeech
    SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
    Patrick O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael Shulman, Boris Ginsburg, Shinji Watanabe, and Georg Kucsko
    In Proceedings of Interspeech 2021
  19. ASR&SD&SLU&ER Interspeech
    SUPERB: Speech processing Universal PERformance Benchmark
    Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Lai, Kushal Lakhotia, Yist Y., Andy T., Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee
    In Proceedings of Interspeech 2021
  20. SSA Interspeech
    Leveraging Pre-trained Language Model for Speech Sentiment Analysis
    Suwon Shon, Pablo Brusco, Jing Pan, Kyu Han, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  21. ASR Interspeech
    Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
    Tianzi Wang, Yuya Fujita, Xuankai Chang, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  22. SLU Interspeech
    Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding
    Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, and Alan W. Black
    In Proceedings of Interspeech 2021
  23. ASR & SpeDialog Interspeech
    Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021
    Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe, and Alexander Rudnicky
    In Proceedings of Interspeech 2021
  24. ASR Interspeech
    Layer Pruning on Demand with Intermediate CTC
    Jaesong Lee, Jingu Kang, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  25. ASR Interspeech
    Toward Streaming ASR with Non-autoregressive Insertion-based Model
    Yuya Fujita, Tianzi Wang, Shinji Watanabe, and Motoi Omachi
    In Proceedings of Interspeech 2021
  26. SE&ASR Interspeech
    Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics
    Katerina Zmolikova, Marc Delcroix, Desh Raj, Shinji Watanabe, and Jan Honza Černocký
    In Proceedings of Interspeech 2021
  27. ASR Interspeech
    Data Augmentation Methods for End-to-end Speech Recognition on Distant-talk Scenarios
    Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  28. SD Interspeech
    Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
    Mao-Kui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  29. SE&ASR&ST DSLW
    The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
    Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, and Wangyou Zhang
    In Proceedings of 2021 IEEE Data Science and Learning Workshop 2021
  30. SE SLT
    Dual-path RNN for long recording speech separation
    Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, Keisuke Kinoshita, Christoph Boeddeker, Yanmin Qian, Shinji Watanabe, and Zhuo Chen
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  31. SD SLT
    End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection
    Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola Garcı́a, and Kenji Nagamatsu
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  32. ASR SLT
    Streaming Transformer ASR with blockwise synchronous beam search
    Emiru Tsunoo, Yosuke Kashiwagi, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  33. SE SLT
    Sequential multi-frame neural beamforming for speech separation and enhancement
    Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, and John R Hershey
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  34. SD SLT
    DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs
    Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, and Sanjeev Khudanpur
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  35. SE&SE&ASR SLT
    Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
    Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, and others
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  36. SD SLT
    Online end-to-end neural diarization with speaker-tracing buffer
    Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Paola Garcı́a, and Kenji Nagamatsu
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  37. ST AmericasNLP
    Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation
    In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
  38. ASR AmericasNLP
    End-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl Mixtec
    Jonathan D Amith, Jiatong Shi, and Rey Castillo Garcı́a
    In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
  39. ASR NAACL
    End-to-end ASR to jointly predict transcriptions and linguistic annotations
    Motoi Omachi, Yuya Fujita, Shinji Watanabe, and Matthew Wiesner
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
  40. ST NAACL
    Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
    Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, and Shinji Watanabe
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
  41. ST NAACL
    Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
    Hirofumi Inaguma, Tatsuya Kawahara, and Shinji Watanabe
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
  42. ASR EACL
    Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec
    Jiatong Shi, Jonathan D Amith, Rey Castillo Garcı́a, Esteban Guadalupe Sierra, Kevin Duh, and Shinji Watanabe
    In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2021
  43. SD Interspeech
    Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers
    Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Leibny Paola Garcia Perera, and Kenji Namagatsu
    In Proceedings of Interspeech 2021
  44. SD Interspeech
    Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization
    Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Leibny Paola, and Kenji Nagamatsu
    In Proceedings of Interspeech 2021
  45. SE Interspeech
    Continuous speech separation using speaker inventory for long recording
    Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John Hershey, Nima Mesgarani, and Zhuo Chen
    In Proceedings of Interspeech 2021
  46. SD ICASSP
    End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings
    Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, and John R Hershey
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  47. SE ICASSP
    Dual-Path Modeling for Long Recording Speech Separation in Meetings
    Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, and Yanmin Qian
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  48. ASR ICASSP
    Recent developments on espnet toolkit boosted by conformer
    Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, and others
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  49. SE&ASR ICASSP
    End-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend
    Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, and Yanmin Qian
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  50. SD ICASSP
    End-to-end speaker diarization as post-processing
    Shota Horiguchi, Paola Garcı́a, Yusuke Fujita, Shinji Watanabe, and Kenji Nagamatsu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  51. ASR ICASSP
    Improved Mask-CTC for Non-Autoregressive End-to-End ASR
    Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, and Tetsunori Kobayashi
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  52. ASR ICASSP
    Intermediate Loss Regularization for CTC-Based Speech Recognition
    Jaesong Lee, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  53. ST ICASSP
    Orthros: Non-autoregressive end-to-end speech translation with dual-decoder
    Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  54. ASR ICASSP
    Directional ASR: A new paradigm for E2E multi-speaker speech recognition with source localization
    Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, and Dong Yu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  55. ASR&TTS&SSL ICASSP
    Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition
    Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Ramon Fernandez Astudillo, and others
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  56. ASR ICASSP
    Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition
    Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  57. SE&ASR ICASSP
    Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation
    Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  58. SE ICASSP
    Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step
    Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  59. Music ICASSP
    Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss
    Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, and Qin Jin
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  60. SE SLT
    ESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR Integration
    Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker, Zhuo Chen, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021

2020

  1. TTS ICASSP
    Espnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit
    Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, and Xu Tan
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
  2. ST ACL
    ESPnet-ST: All-in-One Speech Translation Toolkit
    Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Yalta, Tomoki Hayashi, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2020
  3. SR&SSL NeurIPS
    Augmentation adversarial training for self-supervised speaker recognition
    Jaesung Huh, Hee Soo Heo, Jingu Kang, Shinji Watanabe, and Joon Son Chung
    2020
  4. SED DCASE
    Conformer-based sound event detection with semi-supervised learning and data augmentation
    Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, and Kazuya Takeda
    2020
  5. ASR CHiME
    The JHU multi-microphone multi-speaker ASR system for the CHiME-6 challenge
    Ashish Arora, Desh Raj, Aswin Shanmugam Subramanian, Ke Li, Bar Ben-Yair, Matthew Maciejewski, Piotr Żelasko, Paola Garcia, Shinji Watanabe, and Sanjeev Khudanpur
    2020
  6. ASR ICASSP
    End-to-end multi-speaker speech recognition with transformer
    Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, and Shinji Watanabe
    In 2020
  7. TTS ICASSP
    Semi-supervised speaker adaptation for end-to-end speech synthesis with pretrained models
    Katsuki Inoue, Sunao Hara, Masanobu Abe, Tomoki Hayashi, Ryuichi Yamamoto, and Shinji Watanabe
    In 2020
  8. ASR ICASSP
    End-to-end automatic speech recognition integrated with ctc-based voice activity detection
    Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, and Shinji Watanabe
    In 2020
  9. ASR ICASSP
    Attention-based asr with lightweight and dynamic convolutions
    Yuya Fujita, Aswin Shanmugam Subramanian, Motoi Omachi, and Shinji Watanabe
    In 2020
  10. ASR ICASSP
    A practical two-stage training strategy for multi-stream end-to-end speech recognition
    Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, and Hynek Hermansky
    In 2020
  11. SD ICASSP
    Speaker diarization with region proposal network
    Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcı́a, Yiwen Shao, Daniel Povey, and Sanjeev Khudanpur
    In 2020
  12. SED ICASSP
    Weakly-supervised sound event detection with self-attention
    Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, and Kazuya Takeda
    In 2020
  13. SE ICASSP
    Far-field location guided target speech extraction using end-to-end speech recognition objectives
    Aswin Shanmugam Subramanian, Chao Weng, Meng Yu, Shi-Xiong Zhang, Yong Xu, Shinji Watanabe, and Dong Yu
    In 2020
  14. ASR Deep Neural Evolution
    Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms
    Takahiro Shinozaki, Shinji Watanabe, and Kevin Duh
    2020
  15. ASR&TTS VCC
    The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts
    Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, and Tomoki Toda
    2020
  16. SE&ASR NeurIPS
    Sequence to multi-sequence learning via conditional chain mapping for mixture signals
    Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, and Lei Xie
    2020
  17. ASR Interspeech
    End-to-End ASR with Adaptive Span Self-Attention.
    Xuankai Chang, Aswin Shanmugam Subramanian, Pengcheng Guo, Shinji Watanabe, Yuya Fujita, and Motoi Omachi
    In 2020
  18. TTS Interspeech
    Learning speaker embedding from text-to-speech
    Jaejin Cho, Piotr Zelasko, Jesús Villalba, Shinji Watanabe, and Najim Dehak
    2020
  19. SE Interspeech
    Speaker-conditional chain model for speech separation and extraction
    Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, and Bo Xu
    2020
  20. ASR Interspeech
    Insertion-based modeling for end-to-end automatic speech recognition
    Yuya Fujita, Shinji Watanabe, Motoi Omachi, and Xuankai Chan
    2020
  21. SD Interspeech
    End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors
    Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, and Kenji Nagamatsu
    2020
  22. ASR Interspeech
    End-to-end far-field speech recognition with unified dereverberation and beamforming
    Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe, and Yanmin Qian
    2020
  23. ASR Interspeech
    Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict
    Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, and Tetsunori Kobayashi
    2020

2019

  1. ASR ASRU
    Transformer ASR with contextual block processing
    Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
  2. ASR ASRU
    MIMO-Speech: End-to-end multi-channel multi-speaker speech recognition
    Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
  3. ST ASRU
    Multilingual end-to-end speech translation
    Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
  4. ASR+SD ASRU
    Simultaneous speech recognition and speaker diarization for monaural dialogue recordings with target-speaker acoustic models
    Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
  5. ASR ASRU
    Espresso: A fast end-to-end neural speech recognition toolkit
    Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, and Sanjeev Khudanpur
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
  6. ASR ASRU
    A comparative study on transformer vs rnn in speech applications
    Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, and others
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
  7. SD ASRU
    End-to-end neural speaker diarization with self-attention
    Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
  8. ASR ASRU
    Multi-stream end-to-end speech recognition
    Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, and Hynek Hermansky
    IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
  9. SS WASPAA
    Analysis of robustness of deep single-channel speech separation using corpora constructed from multiple domains
    Matthew Maciejewski, Gregory Sell, Yusuke Fujita, Leibny Paola Garcia-Perera, Shinji Watanabe, and Sanjeev Khudanpur
    In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
  10. ASR WASPAA
    Generalized weighted-prediction-error dereverberation with varying source priors for reverberant speech recognition
    Toru Taniguchi, Aswin Shanmugam Subramanian, Xiaofei Wang, Dung Tran, Yuya Fujita, and Shinji Watanabe
    In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
  11. ASR WASPAA
    Speech enhancement using end-to-end speech recognition objectives
    Aswin Shanmugam Subramanian, Xiaofei Wang, Murali Karthick Baskar, Shinji Watanabe, Toru Taniguchi, Dung Tran, and Yuya Fujita
    In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
  12. ASR Interspeech
    End-to-End Multilingual Multi-Speaker Speech Recognition
    In Proceedings of Interspeech 2019
  13. ASR Interspeech
    Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings
    In Proceedings of Interspeech 2019
  14. TTS Interspeech
    Pre-trained Text Embeddings for Enhanced Text-to-Speech Synthesis
    In Proceedings of Interspeech 2019
  15. SD Interspeech
    End-to-End Neural Speaker Diarization with Permutation-Free Objectives
    Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, and Shinji Watanabe
    In Proceedings of Interspeech 2019
  16. ASR Interspeech
    Analysis of Multilingual Sequence-to-Sequence speech recognition systems
    Murali Karthick Baskar Martin Karafiat, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, and Jan Černocký
    In Proceedings of Interspeech 2019
  17. ASR Interspeech
    End-to-end SpeakerBeam for single channel target speech recognition
    Marc Delcroix, Shinji Watanabe, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa, and Tomohiro Nakatani
    In Proceedings of Interspeech 2019
  18. ASR Interspeech
    Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text
    Murali Karthick Baskar, Shinji Watanabe, Ramón Astudillo, Takaaki Hori, Lukas Burget, and Jan Černocký
    In Proceedings of Interspeech 2019
  19. ASR Interspeech
    Study of the performance of automatic speech recognition systems in speakers with Parkinson’s Disease
    Laureano Moro Velazquez, Jaejin Cho, Shinji Watanabe, Mark Hasegawa-Johnson, Odette Scharenborg, Kim Heejin, and Najim Dehak
    In Proceedings of Interspeech 2019
  20. ASR Interspeech
    Vectorized Beam Search for CTC-Attention-based Speech Recognition
    Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Niko Moritz, and Jonathan Le Roux
    In Proceedings of Interspeech 2019
  21. ASR Interspeech
    Speaker recognition benchmark using the CHiME-5 corpus
    Daniel Garcia-Romero, David Snyder, Shinji Watanabe, Gregory Sell, Alan McCree, Dan Povey, and Sanjeev Khudanpur
    In Proceedings of Interspeech 2019
  22. ASR Interspeech
    Improving Transformer Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration
    Shigeki Karita, Nelson Yalta, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa, and Tomohiro Nakatani
    In Proceedings of Interspeech 2019
  23. ASR Interspeech
    Interference Speaker Loss for Target-Speaker Speech Recognition
    Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, and Shinji Watanabe
    In Proceedings of Interspeech 2019
  24. ASR EUSIPCO
    CNN-based multichannel end-to-end speech recognition for everyday home environments
    Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, and Tetsuya Ogata
    In 2019 27th European Signal Processing Conference (EUSIPCO) 2019
  25. OCR ICDAR
    Using ASR methods for OCR
    Ashish Arora, Chun Chieh Chang, Babak Rekabdar, Bagher BabaAli, Daniel Povey, David Etter, Desh Raj, Hossein Hadian, Jan Trmal, Paola Garcia, and others
    In 2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
  26. Music IJCNN
    Weakly-supervised deep recurrent neural networks for basic dance step generation
    Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, and Tetsuya Ogata
    In 2019 International Joint Conference on Neural Networks (IJCNN) 2019
  27. ASR NAACL
    Massively Multilingual Adversarial Speech Recognition
    Oliver Adams, Matthew Wiesner, Shinji Watanabe, and David Yarowsky
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2019
  28. ASR ICASSP
    Promising accurate prefix boosting for sequence-to-sequence ASR
    Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, and Jan Honza Černockỳ
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  29. ASR ICASSP
    Transfer learning of language-independent end-to-end asr with language model fusion
    Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  30. ASR ICASSP
    Improving end-to-end speech recognition with pronunciation-assisted sub-word modeling
    Hainan Xu, Shuoyang Ding, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  31. ASR ICASSP
    Language model integration based on memory control for sequence to sequence speech recognition
    Jaejin Cho, Shinji Watanabe, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesus Villalba, and Najim Dehak
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  32. ASR ICASSP
    Stream attention-based multi-array end-to-end speech recognition
    Xiaofei Wang, Ruizhi Li, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, and Hynek Hermansky
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  33. ASR ICASSP
    Acoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge system
    Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  34. ASR ICASSP
    Cycle-consistency training for end-to-end speech recognition
    Takaaki Hori, Ramon Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, and Jonathan Le Roux
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  35. AED ICASSP
    Joint acoustic and class inference for weakly supervised sound event detection
    Sandeep Kothinti, Keisuke Imoto, Debmalya Chakrabarty, Gregory Sell, Shinji Watanabe, and Mounya Elhilali
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  36. SE ICASSP
    The phasebook: Building complex masks via discrete representations for source separation
    Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, and John R Hershey
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  37. ASR ICASSP
    End-to-end monaural multi-speaker ASR system without pretraining
    Xuankai Chang, Yanmin Qian, Kai Yu, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  38. ASR ICASSP
    Semi-supervised end-to-end speech recognition using text-to-speech and autoencoders
    Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Marc Delcroix, Atsunori Ogawa, and Tomohiro Nakatani
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  39. ASR ICASSP
    Acoustic modeling for distant multi-talker speech recognition with single-and multi-channel branches
    Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019

2018

  1. ML Physica
    Model parameter learning using Kullback–Leibler divergence
    Chungwei Lin, Tim K Marks, Milutin Pajovic, Shinji Watanabe, and Chih-kuan Tung
    Physica A: Statistical Mechanics and its Applications 2018
  2. ASR SLT
    End-to-end speech recognition with word-based RNN language models
    Takaaki Hori, Jaejin Cho, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
  3. ASR SLT
    Low-resource contextual topic identification on speech
    Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, and Sanjeev Khudanpur
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
  4. ASR SLT
    Back-translation-style data augmentation for end-to-end ASR
    Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramon Astudillo, and Kazuya Takeda
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
  5. ASR SLT
    Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling
    Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, and Takaaki Hori
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
  6. ASR Interspeech
    ESPnet: End-to-End Speech Processing Toolkit
    Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai
    Proceedings of Interspeech 2018
  7. ASR Interspeech
    Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline
    Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, and Shinji Watanabe
    Proceedings of Interspeech 2018
  8. ASR Interspeech
    Multi-Head Decoder for End-to-End Speech Recognition
    Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, and Kazuya Takeda
    Proceedings of Interspeech 2018
  9. ASR Interspeech
    Semi-Supervised End-to-End Speech Recognition
    Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa, and Marc Delcroix
    Proceedings of Interspeech 2018
  10. SE&ASR Interspeech
    The Fifth ’CHiME’ Speech Separation and Recognition Challenge: Dataset, Task and Baselines
    Jon Barker, Shinji Watanabe, Emmanuel Vincent, and Jan Trmal
    Proceedings of Interspeech 2018
  11. SE Interspeech
    Student-Teacher Learning for BLSTM Mask-based Speech Enhancement
    Aswin Shanmugam Subramanian, Szu-Jui Chen, and Shinji Watanabe
    Proceedings of Interspeech 2018
  12. ASR Interspeech
    Multi-Modal Data Augmentation for End-to-end ASR
    Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, and Shinji Watanabe
    Proceedings of Interspeech 2018
  13. LID Interspeech
    Effectiveness of single-channel blstm enhancement for language identification
    Peter Sibbern Frederiksen, Jesús Villalba, Shinji Watanabe, Zheng-Hua Tan, and Najim Dehak
    In Interspeech 2018 2018
  14. SD Interspeech
    Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.
    Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe, and others
    In Interspeech 2018
  15. ASR Interspeech
    Auxiliary Feature Based Adaptation of End-to-end ASR Systems
    Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita, and Tomohiro Nakatani
    Proceedings of Interspeech 2018
  16. ASR ACL
    A Purely End-to-End System for Multi-speaker Speech Recognition
    Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, and John R Hershey
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2018
  17. ASR ICASSP
    Speaker adaptation for multichannel end-to-end speech recognition
    Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri, Takaaki Hori, and John Hershey
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
  18. ASR ICASSP
    An end-to-end language-tracking speech recognizer for mixed-language speech
    Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, and John R Hershey
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
  19. ASR ICASSP
    End-to-end multi-speaker speech recognition
    Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe, and John R Hershey
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018

2017