2024 Papers
- SSL EMNLPTowards Robust Speech Representation Learning for Thousands of LanguagesIn Proceedings of EMNLP 2024
- ASR&ER&Speaker SLTLanguage Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion RecognitionIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- Tokenizer SLTCodec-SUPERB \@SLT 2024: A lightweight benchmark for neural codec modelsIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- Music SLTVISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning RepresentationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- Tokenizer SLTESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and SpeechIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR SLTFLORAS 50: A Massively Multilingual Multitask Benchmark for Long-form Conversational SpeechIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR SLTContextualized Automatic Speech Recognition with Dynamic VocabularyIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR SLTRobust Audiovisual Speech Recognition Models with Mixture-of-ExpertsIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- SE SLTDiffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech EnhancementIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR SLTFusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech RecognitionIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR&TTS SLTESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and IntegrationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR InterspeechMulti-Convformer: Extending Conformer with Multiple Convolution KernelsIn Proceedings of Interspeech 2024
- ASR InterspeechSelf-training ASR Guided by Unsupervised ASR TeacherIn Proceedings of Interspeech 2024
- Tokenizer InterspeechMMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning ModelIn Proceedings of Interspeech 2024
- ASR InterspeechML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and DatasetsIn Proceedings of Interspeech 2024
- ASR InterspeechEFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual ScenariosIn Proceedings of Interspeech 2024
- ASR InterspeechConvolution-Augmented Parameter-Efficient Fine-Tuning for Speech RecognitionIn Proceedings of Interspeech 2024
- ASR InterspeechOn the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation ModelsIn Proceedings of Interspeech 2024
- SLU InterspeechTowards Unified Evaluation of Continual Learning in Spoken Language UnderstandingIn Proceedings of Interspeech 2024
- ASR&TTS&Music InterspeechThe Interspeech 2024 Challenge on Speech Processing Using Discrete UnitsIn Proceedings of Interspeech 2024
- Evaluation InterspeechSpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation MetricsIn Proceedings of Interspeech 2024
- Speaker InterspeechTo what extent can ASV systems naturally defend against spoofing attacks?In Proceedings of Interspeech 2024
- Speaker InterspeechESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf modelsIn Proceedings of Interspeech 2024
- SLU InterspeechDiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language UnderstandingIn Proceedings of Interspeech 2024
- SE InterspeechBeyond Performance Plateaus: A Comprehensive Study on Scalability in Speech EnhancementIn Proceedings of Interspeech 2024
- ASR InterspeechContextualized End-to-End Automatic Speech Recognition with Intermediate Biasing LossIn Proceedings of Interspeech 2024
- SE InterspeechURGENT Challenge: Universality, Robustness, and Generalizability for speech EnhancemeNTIn Proceedings of Interspeech 2024
- Speaker InterspeechCan you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?In Proceedings of Interspeech 2024
- ASR InterspeechOWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-BranchformerIn Proceedings of Interspeech 2024
- SSL InterspeechSelf-Supervised Speech Representations are More Phonetic than SemanticIn Proceedings of Interspeech 2024
- SS InterspeechNeural Blind Source Separation and Diarization for Distant Speech RecognitionIn Proceedings of Interspeech 2024
- SLU InterspeechFinding Task-specific Subnetworks in Multi-task Spoken Language Understanding ModelIn Proceedings of Interspeech 2024
- ASR InterspeechDecoder-only Architecture for Streaming End-to-end Speech RecognitionIn Proceedings of Interspeech 2024
- ASR InterspeechRapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder PromptingIn Proceedings of Interspeech 2024
- SE InterspeechEARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and DereverberationIn Proceedings of Interspeech 2024
- Music InterspeechSinging Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSingIn Proceedings of Interspeech 2024
- ASR ACLWav2Gloss: Generating Interlinear Glossed Text from SpeechIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
- ASR ACLOWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language IdentificationIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
- SLU ACLOn the Evaluation of Speech Foundation Models for Spoken Language UnderstandingIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
- SS IJCAICross-Talk ReductionIn Proceedings of IJCAI 2024
- SLU NAACLUniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language InstructionsIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2024
- TTS TASLPText-Inductive Graphone-Based Language Adaptation for Low-Resource Speech SynthesisIEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
- Music TASLPMusic ControlNet: Multiple Time-varying Controls for Music GenerationIEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
- ASR SPLMC-Whisper: Improving Distant Speech Recognition by Extending Large Pre-Trained Model to Multi-channelIEEE Signal Processing Letters 2024
- SE ICASSPThe Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker ExtractionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Audio ICASSPImproving Continual Learning of Acoustic Scene Classification via Mutual Information OptimizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPImproving ASR Contextual Biasing with Guided AttentionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SLU ICASSPAugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR&TTS ICASSPVoxtlm: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation TasksIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPSpeech Collage: Code-Switched Audio Generation by Collaging Monolingual CorporaIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ST ICASSPEnhancing End-to-End Conversational Speech Translation Through Target Language Context UtilizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPPhisanet: Phonetically Informed Speech Animation NetworkIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SD&ASR ICASSPOne Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPLess Peaky and More Accurate CTC Forced Alignment by Pruned CTC Loss and Label PriorsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPHuberTopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR&ST&SLU ICASSPExploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative StudyIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- LLM&SLU ICASSPDynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for SpeechIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ST ICASSPCross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter SharingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPSemi-Autoregressive Streaming ASR with Label ContextIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPGenerative Context-Aware Fine-Tuning of Self-Supervised Speech ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPContextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam SearchIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPTrain Long and Test Long: Leveraging Full Document Contexts in Speech ProcessingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SE ICASSPImproving Design of Input Condition Invariant Speech EnhancementIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPPhoneme-Aware Encoding for Prefix-Tree-Based Contextual ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SS ICASSPBoosting Unknown-Number Speaker Separation with Transformer Decoder-Based AttractorIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPVisual Speech Recognition for Low-Resource Languages with Automatic Labels from Whisper ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Caption ICASSPTowards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal TokensIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPUnderstanding Probe Behaviors Through Variational Bounds of Mutual InformationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Caption ICASSPImproving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up AugmentationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPAV-Superb: A Multi-Task Evaluation Benchmark for Audio-Visual Representation ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024