2022 Papers
- TTS AAAIA Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous SpeechIn Proceedings of AAAI 2022
- ASR EMNLPBERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language ModelIn Proceedings of Findings of EMNLP 2022
- SLU EMNLPToken-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End ModelsIn Proceedings of Findings of EMNLP 2022
- SD TASLPOnline Neural Diarization of Unlimited Numbers of Speakers Using Global and Local AttractorsIn IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
- SE CSLA Dilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training DataIn Computer Speech & Language 2022
- SE TASLPEnd-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail PartyIn IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
- SE SPLImproving Frame-Online Neural Speech Enhancement with Overlapped-Frame PredictionIn IEEE Signal Processing Letters 2022
- SD TASLPEncoder-Decoder Based Attractors for End-to-End Neural DiarizationIn IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
- ASR JSTSPSelf-Supervised Speech Representation Learning: A ReviewIn IEEE Journal of Selected Topics in Signal Processing 2022
- ST IWSLTFindings of the IWSLT 2022 Evaluation CampaignIn iwsltt 2022
- SD&SS SLTEEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of SpeakersIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR&SD&SLU&ER SLTSUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR SLTE-Branchformer: Branchformer with Enhanced merging for speech recognitionIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR&SLU SLTA Study on the Integration of Pre-Trained SSL and ASR and LM and SLU Models for Spoken Language UnderstandingIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR&SSL SLTOn Compressing Sequences for Self-Supervised Speech ModelsIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR&SE&SSL SLTEnd-to-End Integration of Speech Recognition and Dereverberation and Beamforming and Self-Supervised Learning RepresentationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- SE SLTMutual Learning of Single- and Multi-Channel End-to-End Neural DiarizationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR SLTEnd-to-End Multi-speaker ASR with Independent Vector AnalysisIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR InterspeechVQ-T: RNN Transducers using Vector-Quantized Prediction Network StatesIn Proceedings of Interspeech 2022
- ASR InterspeechMemory-Efficient Training of RNN-Transducer with Sampled SoftmaxIn Proceedings of Interspeech 2022
- SLU&ST InterspeechBlockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech TranslationIn Proceedings of Interspeech 2022
- Music InterspeechSingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training StrategyIn Proceedings of Interspeech 2022
- Music InterspeechMuskits: an End-to-end Music Processing Toolkit for Singing Voice SynthesisIn Proceedings of Interspeech 2022
- ASR InterspeechAudio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep AnalysisIn Proceedings of Interspeech 2022
- KWS InterspeechAudio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep AnalysisIn Proceedings of Interspeech 2022
- ASR InterspeechASR2K: Speech Recognition for Around 2000 Languages without AudioIn Proceedings of Interspeech 2022
- SE InterspeechESPnet-SE++: Speech Enhancement for Robust Speech Recognition and Translation and and UnderstandingIn Proceedings of Interspeech 2022
- SLU InterspeechTwo-Pass Low Latency End-to-End Spoken Language UnderstandingIn Proceedings of Interspeech 2022
- TTS InterspeechDeep Speech Synthesis from Articulatory RepresentationsIn Proceedings of Interspeech 2022
- ASR InterspeechMinimum latency training of sequence transducers for streaming end-to-end speech recognitionIn Proceedings of Interspeech 2022
- ASR InterspeechStreaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity DetectionIn Proceedings of Interspeech 2022
- ASR InterspeechBetter Intermediates Improve CTC InferenceIn Proceedings of Interspeech 2022
- ASR InterspeechUpdating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR ModelsIn Proceedings of Interspeech 2022
- ASR InterspeechAttention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASRIn Proceedings of Interspeech 2022
- ASR InterspeechResidual Language Model for End-to-end Speech RecognitionIn Proceedings of Interspeech 2022
- TTS InterspeechWhen Is TTS Augmentation Through a Pivot Language Useful?In Proceedings of Interspeech 2022
- TTS InterspeechTriniTTS: Pitch-controllable End-to-end TTS without External AlignerIn Proceedings of Interspeech 2022
- ASR InterspeechOnline Continual Learning of End-to-End Speech Recognition ModelsIn Proceedings of Interspeech 2022
- SE InterspeechImproving Speech Enhancement through Fine-Grained Speech CharacteristicsIn Proceedings of Interspeech 2022
- ASR&SE&SSL InterspeechEnd-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning RepresentationIn Proceedings of Interspeech 2022
- ASR&SSL InterspeechCombining Spectral and Self-Supervised Features for Low Resource Speech Recognition and TranslationIn Proceedings of Interspeech 2022
- ASR&SLU&MT ICMLBranchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and UnderstandingIn Proceedings of the International Conference on Machine Learning (ICML) 2022
- Linguistic ACLZero-shot Learning for Grapheme to Phoneme Conversion with Language EnsembleIn Proceedings of Findings of the Annual Meeting of the Association for Computational Linguistics 2022
- SE&VC&ST ACLSUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative CapabilitiesIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2022
- SE&ASR CSLDeep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognitionComputer Speech & Language 2022
- SD CSLA review of speaker diarization: Recent advances with deep learningComputer Speech & Language 2022
- SE&ASR CSLJoint speaker diarization and speech recognition based on region proposal networksComputer Speech & Language 2022
- ASR CSLArabic speech recognition by end-to-end, modular systems and humanComputer Speech & Language 2022
- ASR ICASSPTOWARDS LOW-DISTORTION MULTI-CHANNEL SPEECH ENHANCEMENT: THE ESPNET-SE SUBMISSION TO THE L3DAS22 CHALLENGEIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- Multimodal ICASSPTHE FIRST MULTIMODAL INFORMATION BASED SPEECH PROCESSING (MISP) CHALLENGE: DATA, TASKS, BASELINES AND RESULTSIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPNON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSINGIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPAN EXPLORATION OF HUBERT WITH LARGE NUMBER OF CLUSTER UNITS AND MODEL ASSESSMENT USING BAYESIAN INFORMATION CRITERIONIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SE&SSL ICASSPINVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATIONIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SE ICASSPCONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENTIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPIMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELSIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPSRU++: PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITIONIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPIntegrating multiple ASR systems into NLP backend with attention fusionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SLU ICASSPESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNETIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPJOINT MODELING OF CODE-SWITCHED AND MONOLINGUAL ASR VIA CONDITIONAL FACTORIZATIONIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPEXTENDED GRAPH TEMPORAL CLASSIFICATION FOR MULTI-SPEAKER END-TO-END ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPSequence Transduction with Graph-based SupervisionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPRUN-AND-BACK STITCH SEARCH: NOVEL BLOCK SYNCHRONOUS DECODING FOR STREAMING ENCODER-DECODER ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- VC&SSL ICASSPS3PRL-VC: OPEN-SOURCE VOICE CONVERSION FRAMEWORK WITH SELF-SUPERVISED SPEECH REPRESENTATIONSIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPJOINT SPEECH RECOGNITION AND AUDIO CAPTIONINGIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SD ICASSPMULTI-CHANNEL END-TO-END NEURAL DIARIZATION WITH DISTRIBUTED MICROPHONESIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPTORCHAUDIO: BUILDING BLOCKS FOR AUDIO AND SPEECH PROCESSINGIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SD ICASSPTowards End-to-End Speaker Diarization with Generalized Neural Speaker ClusteringIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- Music ICASSPTRAINING STRATEGIES FOR AUTOMATIC SONG WRITING: A UNIFIED FRAMEWORK PERSPECTIVEIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SE+ASR CSLAn investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducerComputer Speech & Language 2022
- SE ICASSPTowards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 ChallengeIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022