1. SSL EMNLP
    Towards Robust Speech Representation Learning for Thousands of Languages
    William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, and Shinji Watanabe
    In Proceedings of EMNLP 2024
  2. ASR&ER&Speaker SLT
    Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
    Chao-Han Huck Yang, Tae Jin Park, Yuan Gong, Yuanchao Li, Yen-Ting Lin, Zhehuai Chen, Yuchen Hu, Chen Chen, Kunal Dhawan, Piotr Żelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, and Andreas Stolcke
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  3. Tokenizer SLT
    Codec-SUPERB \@SLT 2024: A lightweight benchmark for neural codec models
    Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Jiawei Du, Kai-Wei Chang, Ke-Han Lu, Alexander Liu, Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, and Hung-yi Lee
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  4. Music SLT
    VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
    Yifeng Yu, Jiatong Shi, Yuning Wu, Yuxun Tang, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  5. Tokenizer SLT
    ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech
    Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander Liu, Bhiksha Raj, Qin Jin, Ruihua Song, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  6. ASR SLT
    FLORAS 50: A Massively Multilingual Multitask Benchmark for Long-form Conversational Speech
    William Chen, Brian Yan, Chih-Chen Chen, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  7. ASR SLT
    Contextualized Automatic Speech Recognition with Dynamic Vocabulary
    Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  8. ASR SLT
    Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
    Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  9. SE SLT
    Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
    Chenda Li, Samuele Cornell, Shinji Watanabe, and Yanmin Qian
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  10. ASR SLT
    Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
    Shih-Heng Wang, Jiatong Shi, Chien-yu Huang, Shinji Watanabe, and Hung-yi Lee
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  11. ASR&TTS SLT
    ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
    Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, and Shinji Watanabe
    In Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
  12. ASR Interspeech
    Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
    Darshan Prabhu, Yifan Peng, Preethi Jyothi, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  13. ASR Interspeech
    Self-training ASR Guided by Unsupervised ASR Teacher
    Hyung Yong Kim, Byeong-Yeol Kim, Yunkyu Lim, Jihwan Park, Shukjae Choi, Yooncheol Ju, Jinseok Park, Youshin Lim, Seung Woo Yu, Hanbin Lee, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  14. Tokenizer Interspeech
    MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
    Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  15. ASR Interspeech
    ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
    Jiatong Shi, Shi-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  16. ASR Interspeech
    EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
    Tejes Srivastava, Jiatong Shi, William Chen, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  17. ASR Interspeech
    Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition
    Kwangyoun Kim, Suwon Shon, Yi-Te Hsu, Prashant Sridhar, Karen Livescu, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  18. ASR Interspeech
    On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models
    Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  19. SLU Interspeech
    Towards Unified Evaluation of Continual Learning in Spoken Language Understanding
    Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, and Bhiksha Raj
    In Proceedings of Interspeech 2024
  20. ASR&TTS&Music Interspeech
    The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
    Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, and Qin Jin
    In Proceedings of Interspeech 2024
  21. Evaluation Interspeech
    SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
    Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, and Hiroshi Saruwatari
    In Proceedings of Interspeech 2024
  22. Speaker Interspeech
    To what extent can ASV systems naturally defend against spoofing attacks?
    Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Siddhant Arora, Junichi Yamagishi, and Joon Son Chung
    In Proceedings of Interspeech 2024
  23. Speaker Interspeech
    ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
    Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Alex Gichamba, Barry-John Theobald, Ahmed Hussen Abdelaziz, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  24. SLU Interspeech
    DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
    Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, and Karen Livescu
    In Proceedings of Interspeech 2024
  25. SE Interspeech
    Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
    Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, and Yanmin Qian
    In Proceedings of Interspeech 2024
  26. ASR Interspeech
    Contextualized End-to-End Automatic Speech Recognition with Intermediate Biasing Loss
    Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  27. SE Interspeech
    URGENT Challenge: Universality, Robustness, and Generalizability for speech EnhancemeNT
    Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, and Yanmin Qian
    In Proceedings of Interspeech 2024
  28. Speaker Interspeech
    Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
    Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, and Barry-John Theobald
    In Proceedings of Interspeech 2024
  29. ASR Interspeech
    OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
    Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  30. SSL Interspeech
    Self-Supervised Speech Representations are More Phonetic than Semantic
    Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  31. SS Interspeech
    Neural Blind Source Separation and Diarization for Distant Speech Recognition
    Yoshiaki Bando, Tomohiko Nakamura, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  32. SLU Interspeech
    Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
    Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  33. ASR Interspeech
    Decoder-only Architecture for Streaming End-to-end Speech Recognition
    Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  34. ASR Interspeech
    Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
    Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  35. SE Interspeech
    EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
    Julius Richter, Yi-Chiao Wu, Steven Krenn, Alexander Richard, Simon Welker, Bunlong Lay, Shinji Watanabe, and Timo Gerkmann
    In Proceedings of Interspeech 2024
  36. Music Interspeech
    Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
    Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, and Shinji Watanabe
    In Proceedings of Interspeech 2024
  37. ASR ACL
    Wav2Gloss: Generating Interlinear Glossed Text from Speech
    Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel Romney Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R Mortensen, and Lori Levin
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
  38. ASR ACL
    OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
    Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
  39. SLU ACL
    On the Evaluation of Speech Foundation Models for Spoken Language Understanding
    Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
  40. SS IJCAI
    Cross-Talk Reduction
    Zhong-Qiu Wang, Anurag Kumar, and Shinji Watanabe
    In Proceedings of IJCAI 2024
  41. SLU NAACL
    UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions
    Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, and Shinji Watanabe
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2024
  42. TTS TASLP
    Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis
    Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, and Hiroshi Saruwatari
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
  43. Music TASLP
    Music ControlNet: Multiple Time-varying Controls for Music Generation
    Shih-Lun Wu, Chris Donahue, Shinji Watanabe, and Nicholas J. Bryan
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
  44. ASR SPL
    MC-Whisper: Improving Distant Speech Recognition by Extending Large Pre-Trained Model to Multi-channel
    Xuankai Chang, Pengcheng Guo, Yuya Fujita, Takashi Maekaku, and Shinji Watanabe
    IEEE Signal Processing Letters 2024
  45. SE ICASSP
    The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
    Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, and Jianqing Gao
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  46. Audio ICASSP
    Improving Continual Learning of Acoustic Scene Classification via Mutual Information Optimization
    Muqiao Yang, Umberto Cappellazzo, Xiang Li, Shinji Watanabe, and Bhiksha Raj
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  47. ASR ICASSP
    Improving ASR Contextual Biasing with Guided Attention
    Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  48. SLU ICASSP
    AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models
    Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  49. ASR&TTS ICASSP
    Voxtlm: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks
    Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  50. ASR ICASSP
    Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora
    Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  51. ST ICASSP
    Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
    Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  52. ASR ICASSP
    Phisanet: Phonetically Informed Speech Animation Network
    Salvador Medina, Sarah Taylor, Carsten Stoll, Gareth Edwards, Alex Hauptmann, Shinji Watanabe, and Iain Matthews
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  53. SD&ASR ICASSP
    One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition
    Samuele Cornell, Jee-weon Jung, Shinji Watanabe, and Stefano Squartini
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  54. ASR ICASSP
    Less Peaky and More Accurate CTC Forced Alignment by Pruned CTC Loss and Label Priors
    Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Shinji Watanabe, Daniel Povey, and Sanjeev Khudanpur
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  55. SSL ICASSP
    HuberTopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model
    Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  56. ASR&ST&SLU ICASSP
    Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
    Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-weon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, and Hsiu-Hsuan Wang
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  57. LLM&SLU ICASSP
    Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
    Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chun-Yi Kuan, Chi-Yuan Hsiao, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, and Hung-yi Lee
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  58. ST ICASSP
    Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
    Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  59. ASR ICASSP
    Semi-Autoregressive Streaming ASR with Label Context
    Siddhant Arora, George Saon, Shinji Watanabe, and Brian Kingsbury
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  60. SSL ICASSP
    Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models
    Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, and Karen Livescu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  61. ASR ICASSP
    Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
    Yui Sudo, Shakeel Muhammad, Yosuke Fukumoto, Yifan Peng, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  62. SSL ICASSP
    Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing
    William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  63. SE ICASSP
    Improving Design of Input Condition Invariant Speech Enhancement
    Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, and Yanmin Qian
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  64. ASR ICASSP
    Phoneme-Aware Encoding for Prefix-Tree-Based Contextual ASR
    Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  65. SS ICASSP
    Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor
    Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  66. ASR ICASSP
    Visual Speech Recognition for Low-Resource Languages with Automatic Labels from Whisper Model
    Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, and Yong Man Ro
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  67. Caption ICASSP
    Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens
    Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, and Yong Man Ro
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  68. SSL ICASSP
    Understanding Probe Behaviors Through Variational Bounds of Mutual Information
    Kwanghee Choi, Jee-weon Jung, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  69. Caption ICASSP
    Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation
    Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  70. SSL ICASSP
    AV-Superb: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
    Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi Luen Feng, and Hung-yi Lee
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024