2024 Papers
- SE ICASSPThe Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker ExtractionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Audio ICASSPImproving Continual Learning of Acoustic Scene Classification via Mutual Information OptimizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPImproving ASR Contextual Biasing with Guided AttentionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SLU ICASSPAugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR&TTS ICASSPVoxtlm: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation TasksIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPSpeech Collage: Code-Switched Audio Generation by Collaging Monolingual CorporaIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ST ICASSPEnhancing End-to-End Conversational Speech Translation Through Target Language Context UtilizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPPhisanet: Phonetically Informed Speech Animation NetworkIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SD&ASR ICASSPOne Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPLess Peaky and More Accurate CTC Forced Alignment by Pruned CTC Loss and Label PriorsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPHuberTopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR&ST&SLU ICASSPExploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative StudyIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- LLM&SLU ICASSPDynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for SpeechIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ST ICASSPCross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter SharingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPSemi-Autoregressive Streaming ASR with Label ContextIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPGenerative Context-Aware Fine-Tuning of Self-Supervised Speech ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPContextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam SearchIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPTrain Long and Test Long: Leveraging Full Document Contexts in Speech ProcessingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SE ICASSPImproving Design of Input Condition Invariant Speech EnhancementIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPPhoneme-Aware Encoding for Prefix-Tree-Based Contextual ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SS ICASSPBoosting Unknown-Number Speaker Separation with Transformer Decoder-Based AttractorIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPVisual Speech Recognition for Low-Resource Languages with Automatic Labels from Whisper ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Caption ICASSPTowards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal TokensIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPUnderstanding Probe Behaviors Through Variational Bounds of Mutual InformationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Caption ICASSPImproving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up AugmentationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPAV-Superb: A Multi-Task Evaluation Benchmark for Audio-Visual Representation ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024