1. ASR+TTS ASRU
    On Prosody Modeling for ASR+TTS based Voice Conversion
    Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, and Tomoki Toda
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  2. SLU ASRU
    Attention-based Multi-hypothesis Fusion for Speech Summarization
    Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  3. ST ASRU
    Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
    Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  4. SD ASRU
    Towards Neural Diarization for Unlimited Numbers of Speakers using Global and Local Attractors
    Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, and Yohei Kawaguchi
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  5. ASR ASRU
    A Study of Transducer based End-to-end ASR with ESPNet: Architecture, Auxiliary Loss and Decoding Strategies
    Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  6. ASR ASRU
    A Comparative Study on Non-autoregressive Modelings for Speech-to-text Generation
    Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  7. SE ASRU
    ConferencingSpeech Challenge: Towards Far-field Multi-channel Speech Enhancement for Video Conferencing
    Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, and Shidong Shang
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  8. ASR+TTS ASRU
    Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
    Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, and Alan Black
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  9. ASR&SSL ASRU
    An Exploration of Self-supervised Pretrained Representations for End-to-end Speech Recognition
    Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-yi Lee, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
  10. VC APSIPA
    Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks
    Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, and Louis-Philippe Morency
    In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2021
  11. ST IWSLT
    ESPnet-ST IWSLT 2021 Offline Speech Translation System
    Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, and Shinji Watanabe
    In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT) 2021
  12. ASR Interspeech
    GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
    Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, and Zhiyong Yan
    In Proceedings of Interspeech 2021
  13. AED Interspeech
    Acoustic Event Detection with Classifier Chains
    Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, and Tomoki Hayashi
    In Proceedings of Interspeech 2021
  14. ASR Interspeech
    Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain
    Pengcheng Guo, Xuankai Chang, Shinji Watanabe, and Lei Xie
    In Proceedings of Interspeech 2021
  15. ASR Interspeech
    Multi-mode Transformer Transducer with Stochastic Future Context
    Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Han, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  16. ASR Interspeech
    Differentiable Allophone Graphs for Language Universal Speech Recognition
    Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  17. SE Interspeech
    Speaker Verification-Based Evaluation of Single-Channel Speech Separation
    Matthew Maciejewski, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of Interspeech 2021
  18. ASR Interspeech
    SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
    Patrick O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael Shulman, Boris Ginsburg, Shinji Watanabe, and Georg Kucsko
    In Proceedings of Interspeech 2021
  19. ASR&SD&SLU&ER Interspeech
    SUPERB: Speech processing Universal PERformance Benchmark
    Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Lai, Kushal Lakhotia, Yist Y., Andy T., Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee
    In Proceedings of Interspeech 2021
  20. SSA Interspeech
    Leveraging Pre-trained Language Model for Speech Sentiment Analysis
    Suwon Shon, Pablo Brusco, Jing Pan, Kyu Han, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  21. ASR Interspeech
    Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
    Tianzi Wang, Yuya Fujita, Xuankai Chang, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  22. SLU Interspeech
    Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding
    Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, and Alan W. Black
    In Proceedings of Interspeech 2021
  23. ASR & SpeDialog Interspeech
    Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021
    Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe, and Alexander Rudnicky
    In Proceedings of Interspeech 2021
  24. ASR Interspeech
    Layer Pruning on Demand with Intermediate CTC
    Jaesong Lee, Jingu Kang, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  25. ASR Interspeech
    Toward Streaming ASR with Non-autoregressive Insertion-based Model
    Yuya Fujita, Tianzi Wang, Shinji Watanabe, and Motoi Omachi
    In Proceedings of Interspeech 2021
  26. SE&ASR Interspeech
    Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics
    Katerina Zmolikova, Marc Delcroix, Desh Raj, Shinji Watanabe, and Jan Honza Černocký
    In Proceedings of Interspeech 2021
  27. ASR Interspeech
    Data Augmentation Methods for End-to-end Speech Recognition on Distant-talk Scenarios
    Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  28. SD Interspeech
    Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
    Mao-Kui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, and Shinji Watanabe
    In Proceedings of Interspeech 2021
  29. SE&ASR&ST DSLW
    The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
    Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, and Wangyou Zhang
    In Proceedings of 2021 IEEE Data Science and Learning Workshop 2021
  30. SE SLT
    Dual-path RNN for long recording speech separation
    Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, Keisuke Kinoshita, Christoph Boeddeker, Yanmin Qian, Shinji Watanabe, and Zhuo Chen
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  31. SD SLT
    End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection
    Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola Garcı́a, and Kenji Nagamatsu
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  32. ASR SLT
    Streaming Transformer ASR with blockwise synchronous beam search
    Emiru Tsunoo, Yosuke Kashiwagi, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  33. SE SLT
    Sequential multi-frame neural beamforming for speech separation and enhancement
    Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, and John R Hershey
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  34. SD SLT
    DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs
    Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, and Sanjeev Khudanpur
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  35. SE&SE&ASR SLT
    Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
    Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, and others
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  36. SD SLT
    Online end-to-end neural diarization with speaker-tracing buffer
    Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Paola Garcı́a, and Kenji Nagamatsu
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
  37. ST AmericasNLP
    Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation
    In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
  38. ASR AmericasNLP
    End-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl Mixtec
    Jonathan D Amith, Jiatong Shi, and Rey Castillo Garcı́a
    In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
  39. ASR NAACL
    End-to-end ASR to jointly predict transcriptions and linguistic annotations
    Motoi Omachi, Yuya Fujita, Shinji Watanabe, and Matthew Wiesner
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
  40. ST NAACL
    Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
    Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, and Shinji Watanabe
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
  41. ST NAACL
    Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
    Hirofumi Inaguma, Tatsuya Kawahara, and Shinji Watanabe
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
  42. ASR EACL
    Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec
    Jiatong Shi, Jonathan D Amith, Rey Castillo Garcı́a, Esteban Guadalupe Sierra, Kevin Duh, and Shinji Watanabe
    In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2021
  43. SD Interspeech
    Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers
    Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Leibny Paola Garcia Perera, and Kenji Namagatsu
    In Proceedings of Interspeech 2021
  44. SD Interspeech
    Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization
    Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Leibny Paola, and Kenji Nagamatsu
    In Proceedings of Interspeech 2021
  45. SE Interspeech
    Continuous speech separation using speaker inventory for long recording
    Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John Hershey, Nima Mesgarani, and Zhuo Chen
    In Proceedings of Interspeech 2021
  46. SD ICASSP
    End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings
    Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, and John R Hershey
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  47. SE ICASSP
    Dual-Path Modeling for Long Recording Speech Separation in Meetings
    Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, and Yanmin Qian
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  48. ASR ICASSP
    Recent developments on espnet toolkit boosted by conformer
    Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, and others
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  49. SE&ASR ICASSP
    End-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend
    Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, and Yanmin Qian
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  50. SD ICASSP
    End-to-end speaker diarization as post-processing
    Shota Horiguchi, Paola Garcı́a, Yusuke Fujita, Shinji Watanabe, and Kenji Nagamatsu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  51. ASR ICASSP
    Improved Mask-CTC for Non-Autoregressive End-to-End ASR
    Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, and Tetsunori Kobayashi
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  52. ASR ICASSP
    Intermediate Loss Regularization for CTC-Based Speech Recognition
    Jaesong Lee, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  53. ST ICASSP
    Orthros: Non-autoregressive end-to-end speech translation with dual-decoder
    Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  54. ASR ICASSP
    Directional ASR: A new paradigm for E2E multi-speaker speech recognition with source localization
    Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, and Dong Yu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  55. ASR&TTS&SSL ICASSP
    Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition
    Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Ramon Fernandez Astudillo, and others
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  56. ASR ICASSP
    Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition
    Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  57. SE&ASR ICASSP
    Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation
    Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  58. SE ICASSP
    Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step
    Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  59. Music ICASSP
    Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss
    Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, and Qin Jin
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
  60. SE SLT
    ESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR Integration
    Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker, Zhuo Chen, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021