2023 Papers
- ASR ASRUEvaluating Self-supervised Speech Models on a Taiwanese Hokkien CorpusIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SVC ASRUThe Singing Voice Conversion Challenge 2023In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRUDomain Adaptation by Data Distribution Matching via Submodularity for Speech RecognitionIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- Summarization&ST ASRUSummarize while Translating: Universal Model with Parallel Decoding for Summarization and TranslationIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRUYODAS: Youtube-Oriented Dataset for Audio and SpeechIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SE&SS ASRUA Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and ExtractionIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR&SSL ASRUTorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorchIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SE ASRUToward Universal Speech Enhancement For Diverse Input ConditionsIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRUFindings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and BeyondIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SSL ASRUJoint Prediction and Denoising for Large-Scale Multilingual Self-Supervised LearningIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRUSegment-Level Vectorized Beam Search Based on Partially Autoregressive InferenceIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR&ST ASRUReproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- Summarization ASRUESPNet-SUMM: Introducing a novel large dataset, toolkit, and a cross-corpora evaluation of speech summarization systemsIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRULV-CTC: Non-autoregressive ASR with CTC and latent variable modelsIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SS NeurIPSUNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training MixturesIn Proceedings of the Conference on Neural Information Processing Systems 2023
- SS WASPAAExploring the Integration of Speech Separation and Recognition with Self-Supervised Learning RepresentationIn IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023
- SS CSLDilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training DataIEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
- SD TASLPOnline Neural Diarization of Unlimited Numbers of Speakers Using Global and Local AttractorsIEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
- MT&ASR TASLPLegoNN: Building Modular Encoder-Decoder ModelsIEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
- ST ACL(demo)ESPnet-ST-v2: Multipurpose Spoken Language Translation ToolkitIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
- ST ACLUnitY: Two-pass Direct Speech-to-speech Translation with Discrete UnitsIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
- ASR ICMLEfficient Sequence Transduction by Jointly Predicting Tokens and DurationsIn Proceedings of the International Conference on Machine Learning (ICML) 2023
- TTS IJCAILearning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text PretrainingIn IJCAI 2023
- TTS InterspeechDeep Speech Synthesis from MRI-Based Articulatory RepresentationsIn Proceedings of Interspeech 2023
- ASR InterspeechA New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task LearningIn Proceedings of Interspeech 2023
- ASR&SSL InterspeechExploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised LearningIn Proceedings of Interspeech 2023
- ASR InterspeechPrompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task GeneralizationIn Proceedings of Interspeech 2023
- ASR&SLU InterspeechIntegrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language UnderstandingIn Proceedings of Interspeech 2023
- ASR InterspeechIntegration of Frame- and Label-synchronous Beam Search for Streaming Encoder–decoder Speech RecognitionIn Proceedings of Interspeech 2023
- ASR InterspeechBayes Risk Transducer: Transducer with Controllable Alignment PredictionIn Proceedings of Interspeech 2023
- SSL InterspeechExploration on HuBERT with Multiple ResolutionIn Proceedings of Interspeech 2023
- ASR InterspeechTime-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block TrainingIn Proceedings of Interspeech 2023
- ASR&SSL InterspeechML-SUPERB: Multilingual Speech Universal PERformance BenchmarkIn Proceedings of Interspeech 2023
- SLU InterspeechTensor Decomposition for Minimization of E2E SLU Model Toward On-Device ProcessingIn Proceedings of Interspeech 2023
- ASR Interspeech4D: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decodersIn Proceedings of Interspeech 2023
- SSL InterspeechDPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech ModelsIn Proceedings of Interspeech 2023
- ASR&ST InterspeechlA Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding TasksIn Proceedings of Interspeech 2023
- SSL InterspeechReducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic ComputeIn Proceedings of Interspeech 2023
- Summarization InterspeechBASS: Block-wise Adaptation for Speech SummarizationIn Proceedings of Interspeech 2023
- ST EACLCTC Alignments Improve Autoregressive TranslationIn Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2023
- ASR ICLRContinuous Pseudo-Labeling from the StartIn Proceedings of the International Conference on Learning Representations (ICLR) 2023
- ASR ICASSPMulti-blank Transducers for Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SE ICASSPPAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech EnhancementIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SD ICASSPIn search of strong embedding extractors for speaker diarisationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPWav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo LanguagesIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPBECTRA: Transducer-based End-to-End ASR with BERT-Enhanced EncoderIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPInterMPL: Momentum Pseudo-Labeling with Intermediate CTC LossIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- TTS&SSL ICASSPA Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech UnitsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SE ICASSPTAPLoss: A Temporal Acoustic Parameter Loss for Speech EnhancementIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL&SLU ICASSPBridging Speech and Text Pre-trained Models with Unsupervised ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- Music ICASSPPHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution PredictorIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SLU ICASSPSpeech summarization of long spoken document: Improving memory efficiency of speech/text encodersIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL ICASSPContext-Aware Fine-Tuning of Self-Supervised Speech ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- S2ST ICASSPEnhancing Speech-To-Speech Translation with Multiple TTS TargetsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPStreaming Joint Speech Recognition and Disfluency DetectionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPTowards Zero-Shot Code-Switched Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ST ICASSPAlign and Write and Re-order: Explainable End-to-End Speech Translation via Operation Sequence GenerationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPImproving Massively Multilingual ASR With Auxiliary CTC ObjectivesIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL ICASSPSpeechLMScore: Evaluating Speech Generation Using Speech Language ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- TTS ICASSPSpeaker-Independent Acoustic-to-Articulatory Speech InversionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SLU ICASSPJoint Modelling of Spoken Language Understanding Tasks with Integrated Dialog HistoryIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SS ICASSPTF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker SeparationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR&SSL ICASSPEURO: ESPnet Unsupervised ASR Open-Source ToolkitIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SE ICASSPNeural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band ModelingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR&SSL ICASSPAvoid Overthinking in Self-Supervised Models for Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- TTS ICASSPArticulatory Representation Learning Via Joint Factor Analysis and Neural Matrix FactorizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR&SLU&SSL ICASSPStructured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and UnderstandingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL ICASSPFully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- MultiModal ICASSPThe Multimodal Information Based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL&ASR ICASSPFINDADAPTNET: Find and Insert Adapters by Learned Layer ImportanceIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPI3D: Transformer architectures with input-dependent dynamic depth for speech recognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023