WAVLab | Publications

2024

SSL EMNLP

Towards Robust Speech Representation Learning for Thousands of Languages

William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, and Shinji Watanabe

In Proceedings of EMNLP 2024
ASR&ER&Speaker SLT

Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Chao-Han Huck Yang, Tae Jin Park, Yuan Gong, Yuanchao Li, Yen-Ting Lin, Zhehuai Chen, Yuchen Hu, Chen Chen, Kunal Dhawan, Piotr Żelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, and Andreas Stolcke

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
Tokenizer SLT

Codec-SUPERB \@SLT 2024: A lightweight benchmark for neural codec models

Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Jiawei Du, Kai-Wei Chang, Ke-Han Lu, Alexander Liu, Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, and Hung-yi Lee

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
Music SLT

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Yifeng Yu, Jiatong Shi, Yuning Wu, Yuxun Tang, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
Tokenizer SLT

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander Liu, Bhiksha Raj, Qin Jin, Ruihua Song, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
ASR SLT

FLORAS 50: A Massively Multilingual Multitask Benchmark for Long-form Conversational Speech

William Chen, Brian Yan, Chih-Chen Chen, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
ASR SLT

Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
ASR SLT

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
SE SLT

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

Chenda Li, Samuele Cornell, Shinji Watanabe, and Yanmin Qian

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
ASR SLT

Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition

Shih-Heng Wang, Jiatong Shi, Chien-yu Huang, Shinji Watanabe, and Hung-yi Lee

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
ASR&TTS SLT

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
ASR Interspeech

Multi-Convformer: Extending Conformer with Multiple Convolution Kernels

Darshan Prabhu, Yifan Peng, Preethi Jyothi, and Shinji Watanabe

In Proceedings of Interspeech 2024
ASR Interspeech

Self-training ASR Guided by Unsupervised ASR Teacher

Hyung Yong Kim, Byeong-Yeol Kim, Yunkyu Lim, Jihwan Park, Shukjae Choi, Yooncheol Ju, Jinseok Park, Youshin Lim, Seung Woo Yu, Hanbin Lee, and Shinji Watanabe

In Proceedings of Interspeech 2024
Tokenizer Interspeech

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, and Shinji Watanabe

In Proceedings of Interspeech 2024
ASR Interspeech

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Jiatong Shi, Shi-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, and Shinji Watanabe

In Proceedings of Interspeech 2024
ASR Interspeech

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios

Tejes Srivastava, Jiatong Shi, William Chen, and Shinji Watanabe

In Proceedings of Interspeech 2024
ASR Interspeech

Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition

Kwangyoun Kim, Suwon Shon, Yi-Te Hsu, Prashant Sridhar, Karen Livescu, and Shinji Watanabe

In Proceedings of Interspeech 2024
ASR Interspeech

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, and Shinji Watanabe

In Proceedings of Interspeech 2024
SLU Interspeech

Towards Unified Evaluation of Continual Learning in Spoken Language Understanding

Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, and Bhiksha Raj

In Proceedings of Interspeech 2024
ASR&TTS&Music Interspeech

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, and Qin Jin

In Proceedings of Interspeech 2024
Evaluation Interspeech

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, and Hiroshi Saruwatari

In Proceedings of Interspeech 2024
Speaker Interspeech

To what extent can ASV systems naturally defend against spoofing attacks?

Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Siddhant Arora, Junichi Yamagishi, and Joon Son Chung

In Proceedings of Interspeech 2024
Speaker Interspeech

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Alex Gichamba, Barry-John Theobald, Ahmed Hussen Abdelaziz, and Shinji Watanabe

In Proceedings of Interspeech 2024
SLU Interspeech

DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, and Karen Livescu

In Proceedings of Interspeech 2024
SE Interspeech

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, and Yanmin Qian

In Proceedings of Interspeech 2024
ASR Interspeech

Contextualized End-to-End Automatic Speech Recognition with Intermediate Biasing Loss

Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe

In Proceedings of Interspeech 2024
SE Interspeech

URGENT Challenge: Universality, Robustness, and Generalizability for speech EnhancemeNT

Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, and Yanmin Qian

In Proceedings of Interspeech 2024
Speaker Interspeech

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, and Barry-John Theobald

In Proceedings of Interspeech 2024
ASR Interspeech

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, and Shinji Watanabe

In Proceedings of Interspeech 2024
SSL Interspeech

Self-Supervised Speech Representations are More Phonetic than Semantic

Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, and Shinji Watanabe

In Proceedings of Interspeech 2024
SS Interspeech

Neural Blind Source Separation and Diarization for Distant Speech Recognition

Yoshiaki Bando, Tomohiko Nakamura, and Shinji Watanabe

In Proceedings of Interspeech 2024
SLU Interspeech

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe

In Proceedings of Interspeech 2024
ASR Interspeech

Decoder-only Architecture for Streaming End-to-end Speech Recognition

Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe

In Proceedings of Interspeech 2024
ASR Interspeech

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting

Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, and Shinji Watanabe

In Proceedings of Interspeech 2024
SE Interspeech

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

Julius Richter, Yi-Chiao Wu, Steven Krenn, Alexander Richard, Simon Welker, Bunlong Lay, Shinji Watanabe, and Timo Gerkmann

In Proceedings of Interspeech 2024
Music Interspeech

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, and Shinji Watanabe

In Proceedings of Interspeech 2024
ASR ACL

Wav2Gloss: Generating Interlinear Glossed Text from Speech

Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel Romney Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R Mortensen, and Lori Levin

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
ASR ACL

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
SLU ACL

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, and Shinji Watanabe

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
SS IJCAI

Cross-Talk Reduction

Zhong-Qiu Wang, Anurag Kumar, and Shinji Watanabe

In Proceedings of IJCAI 2024
SLU NAACL

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, and Shinji Watanabe

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2024
TTS TASLP

Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis

Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, and Hiroshi Saruwatari

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
Music TASLP

Music ControlNet: Multiple Time-varying Controls for Music Generation

Shih-Lun Wu, Chris Donahue, Shinji Watanabe, and Nicholas J. Bryan

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
ASR SPL

MC-Whisper: Improving Distant Speech Recognition by Extending Large Pre-Trained Model to Multi-channel

Xuankai Chang, Pengcheng Guo, Yuya Fujita, Takashi Maekaku, and Shinji Watanabe

IEEE Signal Processing Letters 2024
SE ICASSP

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, and Jianqing Gao

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Audio ICASSP

Improving Continual Learning of Acoustic Scene Classification via Mutual Information Optimization

Muqiao Yang, Umberto Cappellazzo, Xiang Li, Shinji Watanabe, and Bhiksha Raj

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR ICASSP

Improving ASR Contextual Biasing with Guided Attention

Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
SLU ICASSP

AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models

Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR&TTS ICASSP

Voxtlm: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks

Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR ICASSP

Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora

Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, and Sanjeev Khudanpur

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ST ICASSP

Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, and Sanjeev Khudanpur

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR ICASSP

Phisanet: Phonetically Informed Speech Animation Network

Salvador Medina, Sarah Taylor, Carsten Stoll, Gareth Edwards, Alex Hauptmann, Shinji Watanabe, and Iain Matthews

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
SD&ASR ICASSP

One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition

Samuele Cornell, Jee-weon Jung, Shinji Watanabe, and Stefano Squartini

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR ICASSP

Less Peaky and More Accurate CTC Forced Alignment by Pruned CTC Loss and Label Priors

Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Shinji Watanabe, Daniel Povey, and Sanjeev Khudanpur

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
SSL ICASSP

HuberTopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model

Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR&ST&SLU ICASSP

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-weon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, and Hsiu-Hsuan Wang

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
LLM&SLU ICASSP

Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chun-Yi Kuan, Chi-Yuan Hsiao, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, and Hung-yi Lee

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ST ICASSP

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR ICASSP

Semi-Autoregressive Streaming ASR with Label Context

Siddhant Arora, George Saon, Shinji Watanabe, and Brian Kingsbury

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
SSL ICASSP

Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models

Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, and Karen Livescu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR ICASSP

Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search

Yui Sudo, Shakeel Muhammad, Yosuke Fukumoto, Yifan Peng, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
SSL ICASSP

Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing

William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
SE ICASSP

Improving Design of Input Condition Invariant Speech Enhancement

Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, and Yanmin Qian

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR ICASSP

Phoneme-Aware Encoding for Prefix-Tree-Based Contextual ASR

Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
SS ICASSP

Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor

Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
ASR ICASSP

Visual Speech Recognition for Low-Resource Languages with Automatic Labels from Whisper Model

Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, and Yong Man Ro

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Caption ICASSP

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens

Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, and Yong Man Ro

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
SSL ICASSP

Understanding Probe Behaviors Through Variational Bounds of Mutual Information

Kwanghee Choi, Jee-weon Jung, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Caption ICASSP

Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation

Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
SSL ICASSP

AV-Superb: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi Luen Feng, and Hung-yi Lee

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

2023

ASR ASRU

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Lu-Tshiann Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, and Jiatong Shi

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SVC ASRU

The Singing Voice Conversion Challenge 2023

Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, and Tomoki Toda

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition

Yusuke Shinohara, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
Summarization&ST ASRU

Summarize while Translating: Universal Model with Parallel Decoding for Summarization and Translation

Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

YODAS: Youtube-Oriented Dataset for Audio and Speech

Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SE&SS ASRU

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, and Tetsuji Ogawa

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR&SSL ASRU

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, and Yumeng Tao

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SE ASRU

Toward Universal Speech Enhancement For Diverse Input Conditions

Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, and Yanmin Qian

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei Ping Huang, En Pei Hu, Chung, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SSL ASRU

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning

William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

Masao Someki, Nicholas Eng, Yosuke Higuchi, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR&ST ASRU

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
Summarization ASRU

ESPNet-SUMM: Introducing a novel large dataset, toolkit, and a cross-corpora evaluation of speech summarization systems

Roshan Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Atsunori Ogawa, Siddhant Arora, Marc Delcroix, Rita Singh, Shinji Watanabe, and Bhiksha Raj

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

LV-CTC: Non-autoregressive ASR with CTC and latent variable models

Yuya Fujita, Shinji Watanabe, Xuankai Chang, and Takashi Maekaku

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SS NeurIPS

UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures

Zhong-Qiu Wang, and Shinji Watanabe

In Proceedings of the Conference on Neural Information Processing Systems 2023
SS WASPAA

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, and Shinji Watanabe

In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023
SS CSL

Dilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training Data

Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
SD TASLP

Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors

Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, and Yohei Kawaguchi

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
MT&ASR TASLP

LegoNN: Building Modular Encoder-Decoder Models

Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, and Abdelrahman Mohamed

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
ST ACL(demo)

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, and Shinji Watanabe

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
ST ACL

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, and Juan Pino

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
ASR ICML

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, and Boris Ginsburg

In Proceedings of the International Conference on Machine Learning (ICML) 2023
TTS IJCAI

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, and Hiroshi Saruwatari

In IJCAI 2023
TTS Interspeech

Deep Speech Synthesis from MRI-Based Articulatory Representations

Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan Black, Louis Goldstein, Shinji Watanabe, and Gopala Krishna Anumanchipalli

In Proceedings of Interspeech 2023
ASR Interspeech

A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning

Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, and Brian MacWhinney

In Proceedings of Interspeech 2023
ASR&SSL Interspeech

Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

Puyuan Peng, Brian Yan, Shinji Watanabe, and David Harwath

In Proceedings of Interspeech 2023
ASR&SLU Interspeech

Integrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language Understanding

Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder–decoder Speech Recognition

Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng Yu, and Shinji Watanabe

In Proceedings of Interspeech 2023
SSL Interspeech

Exploration on HuBERT with Multiple Resolution

Jiatong Shi, Yun Tang, HIrofumi Inaguma, Hongyu Gong, Juan Pino, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training

Yui Sudo, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR&SSL Interspeech

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

In Proceedings of Interspeech 2023
SLU Interspeech

Tensor Decomposition for Minimization of E2E SLU Model Toward On-Device Processing

Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

4D: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, and Shinji Watanabe

In Proceedings of Interspeech 2023
SSL Interspeech

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR&ST Interspeech

lA Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

Yifan Peng Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, and Shinji Watanabe

In Proceedings of Interspeech 2023
SSL Interspeech

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, and Shinji Watanabe

In Proceedings of Interspeech 2023
Summarization Interspeech

BASS: Block-wise Adaptation for Speech Summarization

Roshan Sharma, Siddhant Arora, Kenneth Zheng, Shinji Watanabe, Rita Singh, and Bhiksha Raj

In Proceedings of Interspeech 2023
ST EACL

CTC Alignments Improve Autoregressive Translation

Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, and Shinji Watanabe

In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2023
ASR ICLR

Continuous Pseudo-Labeling from the Start

Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, and Tatiana Likhomanenko

In Proceedings of the International Conference on Learning Representations (ICLR) 2023
ASR ICASSP

Multi-blank Transducers for Speech Recognition

Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, and Boris Ginsburg

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SE ICASSP

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SD ICASSP

In search of strong embedding extractors for speaker diarisation

Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, and Joon Son Chung

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu J. Han, Ryan McDonald, Kilian Q. Weinberger, and Yoav Artzi

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
TTS&SSL ICASSP

A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units

Li-Wei Chen, Shinji Watanabe, and Alexander Rudnicky

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SE ICASSP

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL&SLU ICASSP

Bridging Speech and Text Pre-trained Models with Unsupervised ASR

Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, and Hung-yi Lee

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
Music ICASSP

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Yuning Wu, Jiatong Shi, Tao Qian, and Qin Jin

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SLU ICASSP

Speech summarization of long spoken document: Improving memory efficiency of speech/text encoders

Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL ICASSP

Context-Aware Fine-Tuning of Self-Supervised Speech Models

Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
S2ST ICASSP

Enhancing Speech-To-Speech Translation with Multiple TTS Targets

Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

Streaming Joint Speech Recognition and Disfluency Detection

Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

Towards Zero-Shot Code-Switched Speech Recognition

Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ST ICASSP

Align and Write and Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL ICASSP

SpeechLMScore: Evaluating Speech Generation Using Speech Language Model

Soumi Maiti, Yifan Peng, Takaaki Saeki, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
TTS ICASSP

Speaker-Independent Acoustic-to-Articulatory Speech Inversion

Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W. Black, and Gopala K. Anumanchipalli

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SLU ICASSP

Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History

Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SS ICASSP

TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation

Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR&SSL ICASSP

EURO: ESPnet Unsupervised ASR Open-Source Toolkit

Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, and Sanjeev Khudanpur

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SE ICASSP

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR&SSL ICASSP

Avoid Overthinking in Self-Supervised Models for Speech Recognition

Dan Berrebbi, Brian Yan, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
TTS ICASSP

Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization

Jiachen Lian, Alan W Black, Yijing Lu, Louis Goldstein, Shinji Watanabe, and Gopala K. Anumanchipalli

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR&SLU&SSL ICASSP

Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL ICASSP

Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model

Takashi Maekaku, Yuya Fujita, Xuankai Chang, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
MultiModal ICASSP

The Multimodal Information Based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition

Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, and Cong Liu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL&ASR ICASSP

FINDADAPTNET: Find and Insert Adapters by Learned Layer Importance

Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Yifan Peng, Jaesong Lee, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023

2022

TTS AAAI

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech

Li-Wei Chen, Alexander Rudnicky, and Shinji Watanabe

In Proceedings of AAAI 2022
ASR EMNLP

BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe

In Proceedings of Findings of EMNLP 2022
SLU EMNLP

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, and Shinji Watanabe

In Proceedings of Findings of EMNLP 2022
SD TASLP

Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors

Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, and Yohei Kawaguchi

In IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
SE CSL

A Dilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training Data

Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur

In Computer Speech & Language 2022
SE TASLP

End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail Party

Wangyou Zhang, Xuankai Chang, Christoph Boeddeker, Tomohiro Nakatani, Shinji Watanabe, and Yanmin Qian

In IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
SE SPL

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Zhong-Qiu Wang, and Shinji Watanabe

In IEEE Signal Processing Letters 2022
SD TASLP

Encoder-Decoder Based Attractors for End-to-End Neural Diarization

Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, and Paola Garcia

In IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
ASR JSTSP

Self-Supervised Speech Representation Learning: A Review

Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, and Shinji Watanabe

In IEEE Journal of Selected Topics in Signal Processing 2022
ST IWSLT

Findings of the IWSLT 2022 Evaluation Campaign

Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, and Shinji Watanabe

In iwsltt 2022
SD&SS SLT

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

Yushi Ueda, Soumi Maiti, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, and Yong Xu

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
ASR&SD&SLU&ER SLT

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdel-rahman Mohamed, Shang-Wen Li, and Hung-yi Lee

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
ASR SLT

E-Branchformer: Branchformer with Enhanced merging for speech recognition

Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
ASR&SLU SLT

A Study on the Integration of Pre-Trained SSL and ASR and LM and SLU Models for Spoken Language Understanding

Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
ASR&SSL SLT

On Compressing Sequences for Self-Supervised Speech Models

Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee, and Hao Tang

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
ASR&SE&SSL SLT

End-to-End Integration of Speech Recognition and Dereverberation and Beamforming and Self-Supervised Learning Representation

Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, and Nobutaka Ono

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
SE SLT

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

Shota Horiguchi, Yuki Takashima, Shinji Watanabe, and Paola Garcia

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
ASR SLT

End-to-End Multi-speaker ASR with Independent Vector Analysis

Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, and Yanmin Qian

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
ASR Interspeech

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

Jiatong Shi, George Saon, David Haws, Shinji Watanabe, and Brian Kingsbury

In Proceedings of Interspeech 2022
ASR Interspeech

Memory-Efficient Training of RNN-Transducer with Sampled Softmax

Jaesong Lee, Lukas Lee, and Shinji Watanabe

In Proceedings of Interspeech 2022
SLU&ST Interspeech

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

Keqi Deng, Shinji Watanabe, Jiatong Shi, and Siddhant Arora

In Proceedings of Interspeech 2022
Music Interspeech

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, and Qin Jin

In Proceedings of Interspeech 2022
Music Interspeech

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis

Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, and Qin Jin

In Proceedings of Interspeech 2022
ASR Interspeech

Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis

Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Baocai Yin, and Jia Pan

In Proceedings of Interspeech 2022
KWS Interspeech

Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis

Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Shifu Xiong, and Jian-Qing Gao

In Proceedings of Interspeech 2022
ASR Interspeech

ASR2K: Speech Recognition for Around 2000 Languages without Audio

Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black, and Shinji Watanabe

In Proceedings of Interspeech 2022
SE Interspeech

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition and Translation and and Understanding

Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, and Shinji Watanabe

In Proceedings of Interspeech 2022
SLU Interspeech

Two-Pass Low Latency End-to-End Spoken Language Understanding

Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W Black, and Shinji Watanabe

In Proceedings of Interspeech 2022
TTS Interspeech

Deep Speech Synthesis from Articulatory Representations

Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W Black, and Gopala Krishna Anumanchipalli

In Proceedings of Interspeech 2022
ASR Interspeech

Minimum latency training of sequence transducers for streaming end-to-end speech recognition

Yusuke Shinohara, and Shinji Watanabe

In Proceedings of Interspeech 2022
ASR Interspeech

Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection

Yui Sudo, Shakeel Muhammad, Kazuhiro Nakadai, Jiatong Shi, and Shinji Watanabe

In Proceedings of Interspeech 2022
ASR Interspeech

Better Intermediates Improve CTC Inference

Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, and Yusuke Kida

In Proceedings of Interspeech 2022
ASR Interspeech

Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models

Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Leibny Paola Garcia Perera, and Yohei Kawaguchi

In Proceedings of Interspeech 2022
ASR Interspeech

Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR

Takashi Maekaku, Yuya Fujita, Yifan Peng, and Shinji Watanabe

In Proceedings of Interspeech 2022
ASR Interspeech

Residual Language Model for End-to-end Speech Recognition

Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Prasad Narisetty, and Shinji Watanabe

In Proceedings of Interspeech 2022
TTS Interspeech

When Is TTS Augmentation Through a Pivot Language Useful?

Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen, and Shinji Watanabe

In Proceedings of Interspeech 2022
TTS Interspeech

TriniTTS: Pitch-controllable End-to-end TTS without External Aligner

Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, and Shinji Watanabe

In Proceedings of Interspeech 2022
ASR Interspeech

Online Continual Learning of End-to-End Speech Recognition Models

Muqiao Yang, Ian Lane, and Shinji Watanabe

In Proceedings of Interspeech 2022
SE Interspeech

Improving Speech Enhancement through Fine-Grained Speech Characteristics

Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj

In Proceedings of Interspeech 2022
ASR&SE&SSL Interspeech

End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation

Xuankai Chang, Takashi Maekaku, Yuya Fujita, and Shinji Watanabe

In Proceedings of Interspeech 2022
ASR&SSL Interspeech

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan Amith, and Shinji Watanabe

In Proceedings of Interspeech 2022
ASR&SLU&MT ICML

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Yifan Peng, Siddharth Dalmia, Ian Lane, and Shinji Watanabe

In Proceedings of the International Conference on Machine Learning (ICML) 2022
Linguistic ACL

Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble

Xinjian Li, Florian Metze, David R Mortensen, Shinji Watanabe, and Alan Black

In Proceedings of Findings of the Annual Meeting of the Association for Computational Linguistics 2022
SE&VC&ST ACL

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2022
SE&ASR CSL

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition

Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu

Computer Speech & Language 2022
SD CSL

A review of speaker diarization: Recent advances with deep learning

Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J Han, Shinji Watanabe, and Shrikanth Narayanan

Computer Speech & Language 2022
SE&ASR CSL

Joint speaker diarization and speech recognition based on region proposal networks

Zili Huang, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj, and Sanjeev Khudanpur

Computer Speech & Language 2022
ASR CSL

Arabic speech recognition by end-to-end, modular systems and human

Amir Hussein, Shinji Watanabe, and Ahmed Ali

Computer Speech & Language 2022
ASR ICASSP

TOWARDS LOW-DISTORTION MULTI-CHANNEL SPEECH ENHANCEMENT: THE ESPNET-SE SUBMISSION TO THE L3DAS22 CHALLENGE

Jen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
Multimodal ICASSP

THE FIRST MULTIMODAL INFORMATION BASED SPEECH PROCESSING (MISP) CHALLENGE: DATA, TASKS, BASELINES AND RESULTS

Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Di-Yuan Liu, Bao-Cai Yin, Jia Pan, Jian-Qing Gao, and Cong Liu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

NON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSING

Motoi Omachi, Yuya Fujita, Shinji Watanabe, and Tianzi Wang

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

AN EXPLORATION OF HUBERT WITH LARGE NUMBER OF CLUSTER UNITS AND MODEL ASSESSMENT USING BAYESIAN INFORMATION CRITERION

Takashi Maekaku, Xuankai Chang, Yuya Fujita, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
SE&SSL ICASSP

INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION

Zili Huang, Shinji Watanabe, Shu-wen Yang, Paola Garcia, and Sanjeev Khudanpur

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
SE ICASSP

CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT

Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, and Yu Tsao

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS

Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, and Pengyuan Zhang

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

SRU++: PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION

Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Han, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

Integrating multiple ASR systems into NLP backend with attention fusion

Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
SLU ICASSP

ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET

Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

JOINT MODELING OF CODE-SWITCHED AND MONOLINGUAL ASR VIA CONDITIONAL FACTORIZATION

Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, and Dong Yu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

EXTENDED GRAPH TEMPORAL CLASSIFICATION FOR MULTI-SPEAKER END-TO-END ASR

Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, and Jonathan Le Roux

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

Sequence Transduction with Graph-based Supervision

Niko Moritz, Takaaki Hori, Shinji Watanabe, and Jonathan Le Roux

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

RUN-AND-BACK STITCH SEARCH: NOVEL BLOCK SYNCHRONOUS DECODING FOR STREAMING ENCODER-DECODER ASR

Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
VC&SSL ICASSP

S3PRL-VC: OPEN-SOURCE VOICE CONVERSION FRAMEWORK WITH SELF-SUPERVISED SPEECH REPRESENTATIONS

Wen-Chin Huang, Shu-wen Yang, Tomoki Hayashi, Hung-yi Lee, Shinji Watanabe, and Tomoki Toda

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

JOINT SPEECH RECOGNITION AND AUDIO CAPTIONING

Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
SD ICASSP

MULTI-CHANNEL END-TO-END NEURAL DIARIZATION WITH DISTRIBUTED MICROPHONES

Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, and Yohei Kawaguchi

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
ASR ICASSP

TORCHAUDIO: BUILDING BLOCKS FOR AUDIO AND SPEECH PROCESSING

Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, and Vincent Quenneville-Bélair

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
SD ICASSP

Towards End-to-End Speaker Diarization with Generalized Neural Speaker Clustering

Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu, and Dong Yu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
Music ICASSP

TRAINING STRATEGIES FOR AUTOMATIC SONG WRITING: A UNIFIED FRAMEWORK PERSPECTIVE

Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, and Qin Jin

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
SE+ASR CSL

An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer

Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu

Computer Speech & Language 2022

Abs

Target-speaker speech recognition aims to recognize the speech of an enrolled speaker from an environment with background noise and interfering speakers. This study presents a joint framework that combines time-domain target speaker extraction and recurrent neural network transducer (RNN-T) for speech recognition. To alleviate the adverse effects of residual noise and artifacts introduced by the target speaker extraction module to the speech recognition back-end, we explore to training the target speaker extraction and RNN-T jointly. We find a multi-stage training strategy that pre-trains and fine-tunes each module before joint training is crucial in stabilizing the training process. In addition, we propose a novel neural uncertainty estimation that leverages useful information from the target speaker extraction module to further improve the back-end speech recognizer (i.e., speaker identity uncertainty and speech enhancement uncertainty). Compared to a recognizer with target speech extraction front-end, our experiments show that joint-training and the neural uncertainty module reduce 7% and 17% relative character error rate (CER) on multi-talker simulation data, respectively. The multi-condition experiments indicate that our method can reduce 9% relative CER in the noisy condition without losing performance in the clean condition. We also observe consistent improvements in further evaluation of real-world data based on vehicular speech.
SE ICASSP

Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge

Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022

2021

ASR+TTS ASRU

On Prosody Modeling for ASR+TTS based Voice Conversion

Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, and Tomoki Toda

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
SLU ASRU

Attention-based Multi-hypothesis Fusion for Speech Summarization

Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ST ASRU

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
SD ASRU

Towards Neural Diarization for Unlimited Numbers of Speakers using Global and Local Attractors

Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, and Yohei Kawaguchi

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ASR ASRU

A Study of Transducer based End-to-end ASR with ESPNet: Architecture, Auxiliary Loss and Decoding Strategies

Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ASR ASRU

A Comparative Study on Non-autoregressive Modelings for Speech-to-text Generation

Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
SE ASRU

ConferencingSpeech Challenge: Towards Far-field Multi-channel Speech Enhancement for Video Conferencing

Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, and Shidong Shang

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ASR+TTS ASRU

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, and Alan Black

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
ASR&SSL ASRU

An Exploration of Self-supervised Pretrained Representations for End-to-end Speech Recognition

Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-yi Lee, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
VC APSIPA

Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks

Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, and Louis-Philippe Morency

In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2021
ST IWSLT

ESPnet-ST IWSLT 2021 Offline Speech Translation System

Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, and Shinji Watanabe

In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT) 2021
ASR Interspeech

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, and Zhiyong Yan

In Proceedings of Interspeech 2021
AED Interspeech

Acoustic Event Detection with Classifier Chains

Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, and Tomoki Hayashi

In Proceedings of Interspeech 2021
ASR Interspeech

Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain

Pengcheng Guo, Xuankai Chang, Shinji Watanabe, and Lei Xie

In Proceedings of Interspeech 2021
ASR Interspeech

Multi-mode Transformer Transducer with Stochastic Future Context

Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Han, and Shinji Watanabe

In Proceedings of Interspeech 2021
ASR Interspeech

Differentiable Allophone Graphs for Language Universal Speech Recognition

Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, and Shinji Watanabe

In Proceedings of Interspeech 2021
SE Interspeech

Speaker Verification-Based Evaluation of Single-Channel Speech Separation

Matthew Maciejewski, Shinji Watanabe, and Sanjeev Khudanpur

In Proceedings of Interspeech 2021
ASR Interspeech

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

Patrick O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael Shulman, Boris Ginsburg, Shinji Watanabe, and Georg Kucsko

In Proceedings of Interspeech 2021
ASR&SD&SLU&ER Interspeech

SUPERB: Speech processing Universal PERformance Benchmark

Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Lai, Kushal Lakhotia, Yist Y., Andy T., Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee

In Proceedings of Interspeech 2021

arXiv HTML PDF
SSA Interspeech

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

Suwon Shon, Pablo Brusco, Jing Pan, Kyu Han, and Shinji Watanabe

In Proceedings of Interspeech 2021
ASR Interspeech

Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models

Tianzi Wang, Yuya Fujita, Xuankai Chang, and Shinji Watanabe

In Proceedings of Interspeech 2021
SLU Interspeech

Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding

Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, and Alan W. Black

In Proceedings of Interspeech 2021
ASR & SpeDialog Interspeech

Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021

Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe, and Alexander Rudnicky

In Proceedings of Interspeech 2021
ASR Interspeech

Layer Pruning on Demand with Intermediate CTC

Jaesong Lee, Jingu Kang, and Shinji Watanabe

In Proceedings of Interspeech 2021
ASR Interspeech

Toward Streaming ASR with Non-autoregressive Insertion-based Model

Yuya Fujita, Tianzi Wang, Shinji Watanabe, and Motoi Omachi

In Proceedings of Interspeech 2021
SE&ASR Interspeech

Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics

Katerina Zmolikova, Marc Delcroix, Desh Raj, Shinji Watanabe, and Jan Honza Černocký

In Proceedings of Interspeech 2021
ASR Interspeech

Data Augmentation Methods for End-to-end Speech Recognition on Distant-talk Scenarios

Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, and Shinji Watanabe

In Proceedings of Interspeech 2021
SD Interspeech

Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

Mao-Kui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, and Shinji Watanabe

In Proceedings of Interspeech 2021
SE&ASR&ST DSLW

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, and Wangyou Zhang

In Proceedings of 2021 IEEE Data Science and Learning Workshop 2021
SE SLT

Dual-path RNN for long recording speech separation

Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, Keisuke Kinoshita, Christoph Boeddeker, Yanmin Qian, Shinji Watanabe, and Zhuo Chen

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SD SLT

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola Garcı́a, and Kenji Nagamatsu

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
ASR SLT

Streaming Transformer ASR with blockwise synchronous beam search

Emiru Tsunoo, Yosuke Kashiwagi, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SE SLT

Sequential multi-frame neural beamforming for speech separation and enhancement

Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, and John R Hershey

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SD SLT

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, and Sanjeev Khudanpur

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SE&SE&ASR SLT

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, and others

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
SD SLT

Online end-to-end neural diarization with speaker-tracing buffer

Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Paola Garcı́a, and Kenji Nagamatsu

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
ST AmericasNLP

Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation

Jiatong Shi, Jonathan D Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, and Shinji Watanabe

In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
ASR AmericasNLP

End-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl Mixtec

Jonathan D Amith, Jiatong Shi, and Rey Castillo Garcı́a

In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
ASR NAACL

End-to-end ASR to jointly predict transcriptions and linguistic annotations

Motoi Omachi, Yuya Fujita, Shinji Watanabe, and Matthew Wiesner

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
ST NAACL

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks

Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, and Shinji Watanabe

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
ST NAACL

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

Hirofumi Inaguma, Tatsuya Kawahara, and Shinji Watanabe

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
ASR EACL

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec

Jiatong Shi, Jonathan D Amith, Rey Castillo Garcı́a, Esteban Guadalupe Sierra, Kevin Duh, and Shinji Watanabe

In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2021
SD Interspeech

Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Leibny Paola Garcia Perera, and Kenji Namagatsu

In Proceedings of Interspeech 2021
SD Interspeech

Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization

Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Leibny Paola, and Kenji Nagamatsu

In Proceedings of Interspeech 2021
SE Interspeech

Continuous speech separation using speaker inventory for long recording

Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John Hershey, Nima Mesgarani, and Zhuo Chen

In Proceedings of Interspeech 2021
SD ICASSP

End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, and John R Hershey

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE ICASSP

Dual-Path Modeling for Long Recording Speech Separation in Meetings

Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, and Yanmin Qian

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Recent developments on espnet toolkit boosted by conformer

Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, and others

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE&ASR ICASSP

End-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend

Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, and Yanmin Qian

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SD ICASSP

End-to-end speaker diarization as post-processing

Shota Horiguchi, Paola Garcı́a, Yusuke Fujita, Shinji Watanabe, and Kenji Nagamatsu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, and Tetsunori Kobayashi

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Intermediate Loss Regularization for CTC-Based Speech Recognition

Jaesong Lee, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ST ICASSP

Orthros: Non-autoregressive end-to-end speech translation with dual-decoder

Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Directional ASR: A new paradigm for E2E multi-speaker speech recognition with source localization

Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, and Dong Yu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR&TTS&SSL ICASSP

Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition

Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Ramon Fernandez Astudillo, and others

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
ASR ICASSP

Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition

Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE&ASR ICASSP

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation

Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE ICASSP

Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step

Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
Music ICASSP

Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss

Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, and Qin Jin

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
SE SLT

ESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR Integration

Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker, Zhuo Chen, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021

Code

2020

TTS ICASSP

Espnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit

Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, and Xu Tan

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020

arXiv HTML PDF Code
ST ACL

ESPnet-ST: All-in-One Speech Translation Toolkit

Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Yalta, Tomoki Hayashi, and Shinji Watanabe

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2020

Code
SR&SSL NeurIPS

Augmentation adversarial training for self-supervised speaker recognition

Jaesung Huh, Hee Soo Heo, Jingu Kang, Shinji Watanabe, and Joon Son Chung

2020

arXiv
SED DCASE

Conformer-based sound event detection with semi-supervised learning and data augmentation

Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, and Kazuya Takeda

2020

HTML
ASR CHiME

The JHU multi-microphone multi-speaker ASR system for the CHiME-6 challenge

Ashish Arora, Desh Raj, Aswin Shanmugam Subramanian, Ke Li, Bar Ben-Yair, Matthew Maciejewski, Piotr Żelasko, Paola Garcia, Shinji Watanabe, and Sanjeev Khudanpur

2020

arXiv
ASR ICASSP

End-to-end multi-speaker speech recognition with transformer

Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, and Shinji Watanabe

In 2020

HTML
TTS ICASSP

Semi-supervised speaker adaptation for end-to-end speech synthesis with pretrained models

Katsuki Inoue, Sunao Hara, Masanobu Abe, Tomoki Hayashi, Ryuichi Yamamoto, and Shinji Watanabe

In 2020

HTML
ASR ICASSP

End-to-end automatic speech recognition integrated with ctc-based voice activity detection

Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, and Shinji Watanabe

In 2020

arXiv
ASR ICASSP

Attention-based asr with lightweight and dynamic convolutions

Yuya Fujita, Aswin Shanmugam Subramanian, Motoi Omachi, and Shinji Watanabe

In 2020

arXiv Code
ASR ICASSP

A practical two-stage training strategy for multi-stream end-to-end speech recognition

Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, and Hynek Hermansky

In 2020

arXiv
SD ICASSP

Speaker diarization with region proposal network

Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcı́a, Yiwen Shao, Daniel Povey, and Sanjeev Khudanpur

In 2020

arXiv HTML
SED ICASSP

Weakly-supervised sound event detection with self-attention

Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, and Kazuya Takeda

In 2020

HTML Code
SE ICASSP

Far-field location guided target speech extraction using end-to-end speech recognition objectives

Aswin Shanmugam Subramanian, Chao Weng, Meng Yu, Shi-Xiong Zhang, Yong Xu, Shinji Watanabe, and Dong Yu

In 2020

HTML
ASR Deep Neural Evolution

Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms

Takahiro Shinozaki, Shinji Watanabe, and Kevin Duh

2020

HTML
ASR&TTS VCC

The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts

Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, and Tomoki Toda

2020

arXiv Code
SE&ASR NeurIPS

Sequence to multi-sequence learning via conditional chain mapping for mixture signals

Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, and Lei Xie

2020

arXiv Code
ASR Interspeech

End-to-End ASR with Adaptive Span Self-Attention.

Xuankai Chang, Aswin Shanmugam Subramanian, Pengcheng Guo, Shinji Watanabe, Yuya Fujita, and Motoi Omachi

In 2020

arXiv
TTS Interspeech

Learning speaker embedding from text-to-speech

Jaejin Cho, Piotr Zelasko, Jesús Villalba, Shinji Watanabe, and Najim Dehak

2020

arXiv Code
SE Interspeech

Speaker-conditional chain model for speech separation and extraction

Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, and Bo Xu

2020

arXiv
ASR Interspeech

Insertion-based modeling for end-to-end automatic speech recognition

Yuya Fujita, Shinji Watanabe, Motoi Omachi, and Xuankai Chan

2020

arXiv
SD Interspeech

End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors

Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, and Kenji Nagamatsu

2020

arXiv Code
ASR Interspeech

End-to-end far-field speech recognition with unified dereverberation and beamforming

Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe, and Yanmin Qian

2020

arXiv Code
ASR Interspeech

Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict

Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, and Tetsunori Kobayashi

2020

arXiv Code

2019

ASR ASRU

Transformer ASR with contextual block processing

Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
ASR ASRU

MIMO-Speech: End-to-end multi-channel multi-speaker speech recognition

Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
ST ASRU

Multilingual end-to-end speech translation

Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
ASR+SD ASRU

Simultaneous speech recognition and speaker diarization for monaural dialogue recordings with target-speaker acoustic models

Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
ASR ASRU

Espresso: A fast end-to-end neural speech recognition toolkit

Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, and Sanjeev Khudanpur

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
ASR ASRU

A comparative study on transformer vs rnn in speech applications

Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, and others

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019

arXiv HTML
SD ASRU

End-to-end neural speaker diarization with self-attention

Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019

arXiv HTML PDF
ASR ASRU

Multi-stream end-to-end speech recognition

Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, and Hynek Hermansky

IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
SS WASPAA

Analysis of robustness of deep single-channel speech separation using corpora constructed from multiple domains

Matthew Maciejewski, Gregory Sell, Yusuke Fujita, Leibny Paola Garcia-Perera, Shinji Watanabe, and Sanjeev Khudanpur

In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
ASR WASPAA

Generalized weighted-prediction-error dereverberation with varying source priors for reverberant speech recognition

Toru Taniguchi, Aswin Shanmugam Subramanian, Xiaofei Wang, Dung Tran, Yuya Fujita, and Shinji Watanabe

In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
ASR WASPAA

Speech enhancement using end-to-end speech recognition objectives

Aswin Shanmugam Subramanian, Xiaofei Wang, Murali Karthick Baskar, Shinji Watanabe, Toru Taniguchi, Dung Tran, and Yuya Fujita

In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
ASR Interspeech

End-to-End Multilingual Multi-Speaker Speech Recognition

In Proceedings of Interspeech 2019
ASR Interspeech

Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings

In Proceedings of Interspeech 2019
TTS Interspeech

Pre-trained Text Embeddings for Enhanced Text-to-Speech Synthesis

In Proceedings of Interspeech 2019
SD Interspeech

End-to-End Neural Speaker Diarization with Permutation-Free Objectives

Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, and Shinji Watanabe

In Proceedings of Interspeech 2019

HTML PDF
ASR Interspeech

Analysis of Multilingual Sequence-to-Sequence speech recognition systems

Murali Karthick Baskar Martin Karafiat, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, and Jan Černocký

In Proceedings of Interspeech 2019
ASR Interspeech

End-to-end SpeakerBeam for single channel target speech recognition

Marc Delcroix, Shinji Watanabe, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa, and Tomohiro Nakatani

In Proceedings of Interspeech 2019
ASR Interspeech

Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text

Murali Karthick Baskar, Shinji Watanabe, Ramón Astudillo, Takaaki Hori, Lukas Burget, and Jan Černocký

In Proceedings of Interspeech 2019
ASR Interspeech

Study of the performance of automatic speech recognition systems in speakers with Parkinson’s Disease

Laureano Moro Velazquez, Jaejin Cho, Shinji Watanabe, Mark Hasegawa-Johnson, Odette Scharenborg, Kim Heejin, and Najim Dehak

In Proceedings of Interspeech 2019
ASR Interspeech

Vectorized Beam Search for CTC-Attention-based Speech Recognition

Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Niko Moritz, and Jonathan Le Roux

In Proceedings of Interspeech 2019
ASR Interspeech

Speaker recognition benchmark using the CHiME-5 corpus

Daniel Garcia-Romero, David Snyder, Shinji Watanabe, Gregory Sell, Alan McCree, Dan Povey, and Sanjeev Khudanpur

In Proceedings of Interspeech 2019
ASR Interspeech

Improving Transformer Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration

Shigeki Karita, Nelson Yalta, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa, and Tomohiro Nakatani

In Proceedings of Interspeech 2019

HTML PDF
ASR Interspeech

Interference Speaker Loss for Target-Speaker Speech Recognition

Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, and Shinji Watanabe

In Proceedings of Interspeech 2019
ASR EUSIPCO

CNN-based multichannel end-to-end speech recognition for everyday home environments

Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, and Tetsuya Ogata

In 2019 27th European Signal Processing Conference (EUSIPCO) 2019
OCR ICDAR

Using ASR methods for OCR

Ashish Arora, Chun Chieh Chang, Babak Rekabdar, Bagher BabaAli, Daniel Povey, David Etter, Desh Raj, Hossein Hadian, Jan Trmal, Paola Garcia, and others

In 2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
Music IJCNN

Weakly-supervised deep recurrent neural networks for basic dance step generation

Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, and Tetsuya Ogata

In 2019 International Joint Conference on Neural Networks (IJCNN) 2019
ASR NAACL

Massively Multilingual Adversarial Speech Recognition

Oliver Adams, Matthew Wiesner, Shinji Watanabe, and David Yarowsky

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2019
ASR ICASSP

Promising accurate prefix boosting for sequence-to-sequence ASR

Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, and Jan Honza Černockỳ

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ASR ICASSP

Transfer learning of language-independent end-to-end asr with language model fusion

Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ASR ICASSP

Improving end-to-end speech recognition with pronunciation-assisted sub-word modeling

Hainan Xu, Shuoyang Ding, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ASR ICASSP

Language model integration based on memory control for sequence to sequence speech recognition

Jaejin Cho, Shinji Watanabe, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesus Villalba, and Najim Dehak

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ASR ICASSP

Stream attention-based multi-array end-to-end speech recognition

Xiaofei Wang, Ruizhi Li, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, and Hynek Hermansky

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ASR ICASSP

Acoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge system

Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe, and Sanjeev Khudanpur

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ASR ICASSP

Cycle-consistency training for end-to-end speech recognition

Takaaki Hori, Ramon Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, and Jonathan Le Roux

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
AED ICASSP

Joint acoustic and class inference for weakly supervised sound event detection

Sandeep Kothinti, Keisuke Imoto, Debmalya Chakrabarty, Gregory Sell, Shinji Watanabe, and Mounya Elhilali

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
SE ICASSP

The phasebook: Building complex masks via discrete representations for source separation

Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, and John R Hershey

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ASR ICASSP

End-to-end monaural multi-speaker ASR system without pretraining

Xuankai Chang, Yanmin Qian, Kai Yu, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ASR ICASSP

Semi-supervised end-to-end speech recognition using text-to-speech and autoencoders

Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Marc Delcroix, Atsunori Ogawa, and Tomohiro Nakatani

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ASR ICASSP

Acoustic modeling for distant multi-talker speech recognition with single-and multi-channel branches

Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019

2018

ML Physica

Model parameter learning using Kullback–Leibler divergence

Chungwei Lin, Tim K Marks, Milutin Pajovic, Shinji Watanabe, and Chih-kuan Tung

Physica A: Statistical Mechanics and its Applications 2018
ASR SLT

End-to-end speech recognition with word-based RNN language models

Takaaki Hori, Jaejin Cho, and Shinji Watanabe

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
ASR SLT

Low-resource contextual topic identification on speech

Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, and Sanjeev Khudanpur

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
ASR SLT

Back-translation-style data augmentation for end-to-end ASR

Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramon Astudillo, and Kazuya Takeda

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
ASR SLT

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, and Takaaki Hori

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
ASR Interspeech

ESPnet: End-to-End Speech Processing Toolkit

Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai

Proceedings of Interspeech 2018

Abs arXiv HTML PDF Code

This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.
ASR Interspeech

Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline

Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, and Shinji Watanabe

Proceedings of Interspeech 2018
ASR Interspeech

Multi-Head Decoder for End-to-End Speech Recognition

Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, and Kazuya Takeda

Proceedings of Interspeech 2018
ASR Interspeech

Semi-Supervised End-to-End Speech Recognition

Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa, and Marc Delcroix

Proceedings of Interspeech 2018
SE&ASR Interspeech

The Fifth ’CHiME’ Speech Separation and Recognition Challenge: Dataset, Task and Baselines

Jon Barker, Shinji Watanabe, Emmanuel Vincent, and Jan Trmal

Proceedings of Interspeech 2018

arXiv HTML PDF
SE Interspeech

Student-Teacher Learning for BLSTM Mask-based Speech Enhancement

Aswin Shanmugam Subramanian, Szu-Jui Chen, and Shinji Watanabe

Proceedings of Interspeech 2018
ASR Interspeech

Multi-Modal Data Augmentation for End-to-end ASR

Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, and Shinji Watanabe

Proceedings of Interspeech 2018
LID Interspeech

Effectiveness of single-channel blstm enhancement for language identification

Peter Sibbern Frederiksen, Jesús Villalba, Shinji Watanabe, Zheng-Hua Tan, and Najim Dehak

In Interspeech 2018 2018
SD Interspeech

Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.

Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe, and others

In Interspeech 2018

Abs HTML PDF

We describe in this paper the experiences of the Johns Hopkins University team during the inaugural DIHARD diarization evaluation. This new task provided microphone recordings in a variety of difficult conditions and challenged researchers to fully consider all speaker activity, without the currently typical practices of unscored collars or ignored overlapping speaker segments. This paper explores several key aspects of currently state-of-the-art diarization methods, such as training data selection, signal bandwidth for feature extraction, representations of speech segments (i-vector versus x-vector) and domain-adaptive processing. In the end, our best system clustered x-vector embeddings trained on wideband microphone data followed by Variational-Bayesian refinement and a speech activity detector specifically trained for this task with in-domain data was found to be the best performing. After presenting these decisions and their final result, we discuss lessons learned and remaining challenges within the lens of this new approach to diarization performance measurement.
ASR Interspeech

Auxiliary Feature Based Adaptation of End-to-end ASR Systems

Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita, and Tomohiro Nakatani

Proceedings of Interspeech 2018
ASR ACL

A Purely End-to-End System for Multi-speaker Speech Recognition

Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, and John R Hershey

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2018
ASR ICASSP

Speaker adaptation for multichannel end-to-end speech recognition

Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri, Takaaki Hori, and John Hershey

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
ASR ICASSP

An end-to-end language-tracking speech recognizer for mixed-language speech

Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, and John R Hershey

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
ASR ICASSP

End-to-end multi-speaker speech recognition

Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe, and John R Hershey

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018

2024

2023

2022

2021

2020

2019

2018

2017