2021 Papers
- ASR+TTS ASRUOn Prosody Modeling for ASR+TTS based Voice ConversionIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- SLU ASRUAttention-based Multi-hypothesis Fusion for Speech SummarizationIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ST ASRUFast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden IntermediatesIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- SD ASRUTowards Neural Diarization for Unlimited Numbers of Speakers using Global and Local AttractorsIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ASR ASRUA Study of Transducer based End-to-end ASR with ESPNet: Architecture, Auxiliary Loss and Decoding StrategiesIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ASR ASRUA Comparative Study on Non-autoregressive Modelings for Speech-to-text GenerationIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- SE ASRUConferencingSpeech Challenge: Towards Far-field Multi-channel Speech Enhancement for Video ConferencingIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ASR+TTS ASRUCross-lingual Transfer for Speech Processing using Acoustic Language SimilarityIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ASR&SSL ASRUAn Exploration of Self-supervised Pretrained Representations for End-to-end Speech RecognitionIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- VC APSIPAUnderstanding the Tradeoffs in Client-side Privacy for Downstream Speech TasksIn Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2021
- ST IWSLTESPnet-ST IWSLT 2021 Offline Speech Translation SystemIn Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT) 2021
- ASR InterspeechGigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed AudioIn Proceedings of Interspeech 2021
- AED InterspeechAcoustic Event Detection with Classifier ChainsIn Proceedings of Interspeech 2021
- ASR InterspeechMulti-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker ChainIn Proceedings of Interspeech 2021
- ASR InterspeechMulti-mode Transformer Transducer with Stochastic Future ContextIn Proceedings of Interspeech 2021
- ASR InterspeechDifferentiable Allophone Graphs for Language Universal Speech RecognitionIn Proceedings of Interspeech 2021
- SE InterspeechSpeaker Verification-Based Evaluation of Single-Channel Speech SeparationIn Proceedings of Interspeech 2021
- ASR InterspeechSPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognitionIn Proceedings of Interspeech 2021
- SSA InterspeechLeveraging Pre-trained Language Model for Speech Sentiment AnalysisIn Proceedings of Interspeech 2021
- ASR InterspeechStreaming End-to-End ASR based on Blockwise Non-Autoregressive ModelsIn Proceedings of Interspeech 2021
- SLU InterspeechRethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language UnderstandingIn Proceedings of Interspeech 2021
- ASR & SpeDialog InterspeechSpeech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021In Proceedings of Interspeech 2021
- ASR InterspeechLayer Pruning on Demand with Intermediate CTCIn Proceedings of Interspeech 2021
- ASR InterspeechToward Streaming ASR with Non-autoregressive Insertion-based ModelIn Proceedings of Interspeech 2021
- SE&ASR InterspeechAuxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristicsIn Proceedings of Interspeech 2021
- ASR InterspeechData Augmentation Methods for End-to-end Speech Recognition on Distant-talk ScenariosIn Proceedings of Interspeech 2021
- SD InterspeechTarget-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of SpeakerIn Proceedings of Interspeech 2021
- SE&ASR&ST DSLWThe 2020 ESPnet update: new features, broadened applications, performance improvements, and future plansIn Proceedings of 2021 IEEE Data Science and Learning Workshop 2021
- SE SLTDual-path RNN for long recording speech separationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SD SLTEnd-to-End Speaker Diarization Conditioned on Speech Activity and Overlap DetectionIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- ASR SLTStreaming Transformer ASR with blockwise synchronous beam searchIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SE SLTSequential multi-frame neural beamforming for speech separation and enhancementIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SD SLTDOVER-Lap: A Method for Combining Overlap-aware Diarization OutputsIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SE&SE&ASR SLTIntegration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysisIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SD SLTOnline end-to-end neural diarization with speaker-tracing bufferIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- ST AmericasNLPHighland Puebla Nahuatl Speech Translation Corpus for Endangered Language DocumentationIn Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
- ASR AmericasNLPEnd-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl MixtecIn Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
- ASR NAACLEnd-to-end ASR to jointly predict transcriptions and linguistic annotationsIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
- ST NAACLSearchable Hidden Intermediates for End-to-End Models of Decomposable Sequence TasksIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
- ST NAACLSource and Target Bidirectional Knowledge Distillation for End-to-end Speech TranslationIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
- ASR EACLLeveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl MixtecIn Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2021
- SD InterspeechOnline Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of SpeakersIn Proceedings of Interspeech 2021
- SD InterspeechSemi-Supervised Training with Pseudo-Labeling for End-to-End Neural DiarizationIn Proceedings of Interspeech 2021
- SE InterspeechContinuous speech separation using speaker inventory for long recordingIn Proceedings of Interspeech 2021
- SD ICASSPEnd-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker EmbeddingsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE ICASSPDual-Path Modeling for Long Recording Speech Separation in MeetingsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPRecent developments on espnet toolkit boosted by conformerIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE&ASR ICASSPEnd-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontendIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SD ICASSPEnd-to-end speaker diarization as post-processingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPImproved Mask-CTC for Non-Autoregressive End-to-End ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPIntermediate Loss Regularization for CTC-Based Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ST ICASSPOrthros: Non-autoregressive end-to-end speech translation with dual-decoderIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPDirectional ASR: A new paradigm for E2E multi-speaker speech recognition with source localizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR&TTS&SSL ICASSPEat: Enhanced ASR-TTS for Self-Supervised Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPGaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE&ASR ICASSPImproving RNN Transducer with Target Speaker Extraction and Neural Uncertainty EstimationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE ICASSPTraining Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small StepIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- Music ICASSPSequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy LossIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE SLTESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR IntegrationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021