WAVLab | 2021 Papers

ASR+TTS ASRU

On Prosody Modeling for ASR+TTS based Voice Conversion

Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, and Tomoki Toda

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
SLU ASRU

Attention-based Multi-hypothesis Fusion for Speech Summarization

Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ST ASRU

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
SD ASRU

Towards Neural Diarization for Unlimited Numbers of Speakers using Global and Local Attractors

Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, and Yohei Kawaguchi

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ASR ASRU

A Study of Transducer based End-to-end ASR with ESPNet: Architecture, Auxiliary Loss and Decoding Strategies

Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ASR ASRU

A Comparative Study on Non-autoregressive Modelings for Speech-to-text Generation

Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
SE ASRU

ConferencingSpeech Challenge: Towards Far-field Multi-channel Speech Enhancement for Video Conferencing

Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, and Shidong Shang

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ASR+TTS ASRU

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, and Alan Black

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ASR&SSL ASRU

An Exploration of Self-supervised Pretrained Representations for End-to-end Speech Recognition

Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-yi Lee, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
VC APSIPA

Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks

Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, and Louis-Philippe Morency

In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2021
ST IWSLT

ESPnet-ST IWSLT 2021 Offline Speech Translation System

Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, and Shinji Watanabe

In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT) 2021
ASR Interspeech

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, and Zhiyong Yan

In Proceedings of Interspeech 2021
AED Interspeech

Acoustic Event Detection with Classifier Chains

Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, and Tomoki Hayashi

In Proceedings of Interspeech 2021
ASR Interspeech

Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain

Pengcheng Guo, Xuankai Chang, Shinji Watanabe, and Lei Xie

In Proceedings of Interspeech 2021
ASR Interspeech

Multi-mode Transformer Transducer with Stochastic Future Context

Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Han, and Shinji Watanabe

In Proceedings of Interspeech 2021
ASR Interspeech

Differentiable Allophone Graphs for Language Universal Speech Recognition

Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, and Shinji Watanabe

In Proceedings of Interspeech 2021
SE Interspeech

Speaker Verification-Based Evaluation of Single-Channel Speech Separation

Matthew Maciejewski, Shinji Watanabe, and Sanjeev Khudanpur

In Proceedings of Interspeech 2021
ASR Interspeech

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

Patrick O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael Shulman, Boris Ginsburg, Shinji Watanabe, and Georg Kucsko

In Proceedings of Interspeech 2021
ASR&SD&SLU&ER Interspeech

SUPERB: Speech processing Universal PERformance Benchmark

Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Lai, Kushal Lakhotia, Yist Y., Andy T., Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee

In Proceedings of Interspeech 2021

arXiv HTML PDF
SSA Interspeech

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

Suwon Shon, Pablo Brusco, Jing Pan, Kyu Han, and Shinji Watanabe

In Proceedings of Interspeech 2021
ASR Interspeech

Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models

Tianzi Wang, Yuya Fujita, Xuankai Chang, and Shinji Watanabe

In Proceedings of Interspeech 2021
SLU Interspeech

Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding

Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, and Alan W. Black

In Proceedings of Interspeech 2021
ASR & SpeDialog Interspeech

Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021

Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe, and Alexander Rudnicky

In Proceedings of Interspeech 2021
ASR Interspeech

Layer Pruning on Demand with Intermediate CTC

Jaesong Lee, Jingu Kang, and Shinji Watanabe

In Proceedings of Interspeech 2021
ASR Interspeech

Toward Streaming ASR with Non-autoregressive Insertion-based Model

Yuya Fujita, Tianzi Wang, Shinji Watanabe, and Motoi Omachi

In Proceedings of Interspeech 2021
SE&ASR Interspeech

Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics

Katerina Zmolikova, Marc Delcroix, Desh Raj, Shinji Watanabe, and Jan Honza Černocký

In Proceedings of Interspeech 2021
ASR Interspeech

Data Augmentation Methods for End-to-end Speech Recognition on Distant-talk Scenarios

Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, and Shinji Watanabe

In Proceedings of Interspeech 2021
SD Interspeech

Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

Mao-Kui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, and Shinji Watanabe

In Proceedings of Interspeech 2021
SE&ASR&ST DSLW

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, and Wangyou Zhang

In Proceedings of 2021 IEEE Data Science and Learning Workshop 2021
SE SLT

Dual-path RNN for long recording speech separation

Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, Keisuke Kinoshita, Christoph Boeddeker, Yanmin Qian, Shinji Watanabe, and Zhuo Chen

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SD SLT

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola Garcı́a, and Kenji Nagamatsu

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
ASR SLT

Streaming Transformer ASR with blockwise synchronous beam search

Emiru Tsunoo, Yosuke Kashiwagi, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SE SLT

Sequential multi-frame neural beamforming for speech separation and enhancement

Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, and John R Hershey

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SD SLT

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, and Sanjeev Khudanpur

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SE&SE&ASR SLT

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, and others

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SD SLT

Online end-to-end neural diarization with speaker-tracing buffer

Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Paola Garcı́a, and Kenji Nagamatsu

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
ST AmericasNLP

Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation

Jiatong Shi, Jonathan D Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, and Shinji Watanabe

In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
ASR AmericasNLP

End-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl Mixtec

Jonathan D Amith, Jiatong Shi, and Rey Castillo Garcı́a

In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
ASR NAACL

End-to-end ASR to jointly predict transcriptions and linguistic annotations

Motoi Omachi, Yuya Fujita, Shinji Watanabe, and Matthew Wiesner

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
ST NAACL

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks

Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, and Shinji Watanabe

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
ST NAACL

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

Hirofumi Inaguma, Tatsuya Kawahara, and Shinji Watanabe

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
ASR EACL

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec

Jiatong Shi, Jonathan D Amith, Rey Castillo Garcı́a, Esteban Guadalupe Sierra, Kevin Duh, and Shinji Watanabe

In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2021
SD Interspeech

Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Leibny Paola Garcia Perera, and Kenji Namagatsu

In Proceedings of Interspeech 2021
SD Interspeech

Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization

Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Leibny Paola, and Kenji Nagamatsu

In Proceedings of Interspeech 2021
SE Interspeech

Continuous speech separation using speaker inventory for long recording

Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John Hershey, Nima Mesgarani, and Zhuo Chen

In Proceedings of Interspeech 2021
SD ICASSP

End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, and John R Hershey

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE ICASSP

Dual-Path Modeling for Long Recording Speech Separation in Meetings

Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, and Yanmin Qian

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Recent developments on espnet toolkit boosted by conformer

Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, and others

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE&ASR ICASSP

End-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend

Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, and Yanmin Qian

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SD ICASSP

End-to-end speaker diarization as post-processing

Shota Horiguchi, Paola Garcı́a, Yusuke Fujita, Shinji Watanabe, and Kenji Nagamatsu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, and Tetsunori Kobayashi

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Intermediate Loss Regularization for CTC-Based Speech Recognition

Jaesong Lee, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ST ICASSP

Orthros: Non-autoregressive end-to-end speech translation with dual-decoder

Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Directional ASR: A new paradigm for E2E multi-speaker speech recognition with source localization

Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, and Dong Yu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR&TTS&SSL ICASSP

Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition

Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Ramon Fernandez Astudillo, and others

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition

Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE&ASR ICASSP

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation

Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE ICASSP

Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step

Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
Music ICASSP

Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss

Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, and Qin Jin

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE SLT

ESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR Integration

Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker, Zhuo Chen, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021

Code