WAVLab | 2023 Papers

ASR ASRU

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Lu-Tshiann Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, and Jiatong Shi

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SVC ASRU

The Singing Voice Conversion Challenge 2023

Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, and Tomoki Toda

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition

Yusuke Shinohara, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
Summarization&ST ASRU

Summarize while Translating: Universal Model with Parallel Decoding for Summarization and Translation

Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

YODAS: Youtube-Oriented Dataset for Audio and Speech

Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SE&SS ASRU

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, and Tetsuji Ogawa

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR&SSL ASRU

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, and Yumeng Tao

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SE ASRU

Toward Universal Speech Enhancement For Diverse Input Conditions

Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, and Yanmin Qian

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei Ping Huang, En Pei Hu, Chung, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SSL ASRU

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning

William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

Masao Someki, Nicholas Eng, Yosuke Higuchi, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR&ST ASRU

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
Summarization ASRU

ESPNet-SUMM: Introducing a novel large dataset, toolkit, and a cross-corpora evaluation of speech summarization systems

Roshan Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Atsunori Ogawa, Siddhant Arora, Marc Delcroix, Rita Singh, Shinji Watanabe, and Bhiksha Raj

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
ASR ASRU

LV-CTC: Non-autoregressive ASR with CTC and latent variable models

Yuya Fujita, Shinji Watanabe, Xuankai Chang, and Takashi Maekaku

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
SS NeurIPS

UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures

Zhong-Qiu Wang, and Shinji Watanabe

In Proceedings of the Conference on Neural Information Processing Systems 2023
SS WASPAA

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, and Shinji Watanabe

In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023
SS CSL

Dilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training Data

Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
SD TASLP

Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors

Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, and Yohei Kawaguchi

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
MT&ASR TASLP

LegoNN: Building Modular Encoder-Decoder Models

Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, and Abdelrahman Mohamed

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
ST ACL(demo)

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, and Shinji Watanabe

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
ST ACL

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, and Juan Pino

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
ASR ICML

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, and Boris Ginsburg

In Proceedings of the International Conference on Machine Learning (ICML) 2023
TTS IJCAI

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, and Hiroshi Saruwatari

In IJCAI 2023
TTS Interspeech

Deep Speech Synthesis from MRI-Based Articulatory Representations

Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan Black, Louis Goldstein, Shinji Watanabe, and Gopala Krishna Anumanchipalli

In Proceedings of Interspeech 2023
ASR Interspeech

A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning

Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, and Brian MacWhinney

In Proceedings of Interspeech 2023
ASR&SSL Interspeech

Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

Puyuan Peng, Brian Yan, Shinji Watanabe, and David Harwath

In Proceedings of Interspeech 2023
ASR&SLU Interspeech

Integrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language Understanding

Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder–decoder Speech Recognition

Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng Yu, and Shinji Watanabe

In Proceedings of Interspeech 2023
SSL Interspeech

Exploration on HuBERT with Multiple Resolution

Jiatong Shi, Yun Tang, HIrofumi Inaguma, Hongyu Gong, Juan Pino, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training

Yui Sudo, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR&SSL Interspeech

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

In Proceedings of Interspeech 2023
SLU Interspeech

Tensor Decomposition for Minimization of E2E SLU Model Toward On-Device Processing

Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR Interspeech

4D: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, and Shinji Watanabe

In Proceedings of Interspeech 2023
SSL Interspeech

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe

In Proceedings of Interspeech 2023
ASR&ST Interspeech

lA Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

Yifan Peng Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, and Shinji Watanabe

In Proceedings of Interspeech 2023
SSL Interspeech

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, and Shinji Watanabe

In Proceedings of Interspeech 2023
Summarization Interspeech

BASS: Block-wise Adaptation for Speech Summarization

Roshan Sharma, Siddhant Arora, Kenneth Zheng, Shinji Watanabe, Rita Singh, and Bhiksha Raj

In Proceedings of Interspeech 2023
ST EACL

CTC Alignments Improve Autoregressive Translation

Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, and Shinji Watanabe

In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2023
ASR ICLR

Continuous Pseudo-Labeling from the Start

Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, and Tatiana Likhomanenko

In Proceedings of the International Conference on Learning Representations (ICLR) 2023
ASR ICASSP

Multi-blank Transducers for Speech Recognition

Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, and Boris Ginsburg

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SE ICASSP

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SD ICASSP

In search of strong embedding extractors for speaker diarisation

Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, and Joon Son Chung

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu J. Han, Ryan McDonald, Kilian Q. Weinberger, and Yoav Artzi

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
TTS&SSL ICASSP

A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units

Li-Wei Chen, Shinji Watanabe, and Alexander Rudnicky

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SE ICASSP

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL&SLU ICASSP

Bridging Speech and Text Pre-trained Models with Unsupervised ASR

Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, and Hung-yi Lee

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
Music ICASSP

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Yuning Wu, Jiatong Shi, Tao Qian, and Qin Jin

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SLU ICASSP

Speech summarization of long spoken document: Improving memory efficiency of speech/text encoders

Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL ICASSP

Context-Aware Fine-Tuning of Self-Supervised Speech Models

Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
S2ST ICASSP

Enhancing Speech-To-Speech Translation with Multiple TTS Targets

Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

Streaming Joint Speech Recognition and Disfluency Detection

Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

Towards Zero-Shot Code-Switched Speech Recognition

Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ST ICASSP

Align and Write and Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL ICASSP

SpeechLMScore: Evaluating Speech Generation Using Speech Language Model

Soumi Maiti, Yifan Peng, Takaaki Saeki, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
TTS ICASSP

Speaker-Independent Acoustic-to-Articulatory Speech Inversion

Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W. Black, and Gopala K. Anumanchipalli

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SLU ICASSP

Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History

Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SS ICASSP

TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation

Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR&SSL ICASSP

EURO: ESPnet Unsupervised ASR Open-Source Toolkit

Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, and Sanjeev Khudanpur

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SE ICASSP

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR&SSL ICASSP

Avoid Overthinking in Self-Supervised Models for Speech Recognition

Dan Berrebbi, Brian Yan, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
TTS ICASSP

Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization

Jiachen Lian, Alan W Black, Yijing Lu, Louis Goldstein, Shinji Watanabe, and Gopala K. Anumanchipalli

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR&SLU&SSL ICASSP

Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL ICASSP

Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model

Takashi Maekaku, Yuya Fujita, Xuankai Chang, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
MultiModal ICASSP

The Multimodal Information Based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition

Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, and Cong Liu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
SSL&ASR ICASSP

FINDADAPTNET: Find and Insert Adapters by Learned Layer Importance

Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
ASR ICASSP

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Yifan Peng, Jaesong Lee, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023