Publications
2024
- SSL EMNLPTowards Robust Speech Representation Learning for Thousands of LanguagesIn Proceedings of EMNLP 2024
- ASR&ER&Speaker SLTLanguage Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion RecognitionIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- Tokenizer SLTCodec-SUPERB \@SLT 2024: A lightweight benchmark for neural codec modelsIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- Music SLTVISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning RepresentationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- Tokenizer SLTESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and SpeechIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR SLTFLORAS 50: A Massively Multilingual Multitask Benchmark for Long-form Conversational SpeechIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR SLTContextualized Automatic Speech Recognition with Dynamic VocabularyIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR SLTRobust Audiovisual Speech Recognition Models with Mixture-of-ExpertsIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- SE SLTDiffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech EnhancementIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR SLTFusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech RecognitionIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR&TTS SLTESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and IntegrationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2024
- ASR InterspeechMulti-Convformer: Extending Conformer with Multiple Convolution KernelsIn Proceedings of Interspeech 2024
- ASR InterspeechSelf-training ASR Guided by Unsupervised ASR TeacherIn Proceedings of Interspeech 2024
- Tokenizer InterspeechMMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning ModelIn Proceedings of Interspeech 2024
- ASR InterspeechML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and DatasetsIn Proceedings of Interspeech 2024
- ASR InterspeechEFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual ScenariosIn Proceedings of Interspeech 2024
- ASR InterspeechConvolution-Augmented Parameter-Efficient Fine-Tuning for Speech RecognitionIn Proceedings of Interspeech 2024
- ASR InterspeechOn the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation ModelsIn Proceedings of Interspeech 2024
- SLU InterspeechTowards Unified Evaluation of Continual Learning in Spoken Language UnderstandingIn Proceedings of Interspeech 2024
- ASR&TTS&Music InterspeechThe Interspeech 2024 Challenge on Speech Processing Using Discrete UnitsIn Proceedings of Interspeech 2024
- Evaluation InterspeechSpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation MetricsIn Proceedings of Interspeech 2024
- Speaker InterspeechTo what extent can ASV systems naturally defend against spoofing attacks?In Proceedings of Interspeech 2024
- Speaker InterspeechESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf modelsIn Proceedings of Interspeech 2024
- SLU InterspeechDiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language UnderstandingIn Proceedings of Interspeech 2024
- SE InterspeechBeyond Performance Plateaus: A Comprehensive Study on Scalability in Speech EnhancementIn Proceedings of Interspeech 2024
- ASR InterspeechContextualized End-to-End Automatic Speech Recognition with Intermediate Biasing LossIn Proceedings of Interspeech 2024
- SE InterspeechURGENT Challenge: Universality, Robustness, and Generalizability for speech EnhancemeNTIn Proceedings of Interspeech 2024
- Speaker InterspeechCan you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?In Proceedings of Interspeech 2024
- ASR InterspeechOWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-BranchformerIn Proceedings of Interspeech 2024
- SSL InterspeechSelf-Supervised Speech Representations are More Phonetic than SemanticIn Proceedings of Interspeech 2024
- SS InterspeechNeural Blind Source Separation and Diarization for Distant Speech RecognitionIn Proceedings of Interspeech 2024
- SLU InterspeechFinding Task-specific Subnetworks in Multi-task Spoken Language Understanding ModelIn Proceedings of Interspeech 2024
- ASR InterspeechDecoder-only Architecture for Streaming End-to-end Speech RecognitionIn Proceedings of Interspeech 2024
- ASR InterspeechRapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder PromptingIn Proceedings of Interspeech 2024
- SE InterspeechEARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and DereverberationIn Proceedings of Interspeech 2024
- Music InterspeechSinging Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSingIn Proceedings of Interspeech 2024
- ASR ACLWav2Gloss: Generating Interlinear Glossed Text from SpeechIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
- ASR ACLOWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language IdentificationIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
- SLU ACLOn the Evaluation of Speech Foundation Models for Spoken Language UnderstandingIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024
- SS IJCAICross-Talk ReductionIn Proceedings of IJCAI 2024
- SLU NAACLUniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language InstructionsIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2024
- TTS TASLPText-Inductive Graphone-Based Language Adaptation for Low-Resource Speech SynthesisIEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
- Music TASLPMusic ControlNet: Multiple Time-varying Controls for Music GenerationIEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
- ASR SPLMC-Whisper: Improving Distant Speech Recognition by Extending Large Pre-Trained Model to Multi-channelIEEE Signal Processing Letters 2024
- SE ICASSPThe Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker ExtractionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Audio ICASSPImproving Continual Learning of Acoustic Scene Classification via Mutual Information OptimizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPImproving ASR Contextual Biasing with Guided AttentionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SLU ICASSPAugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR&TTS ICASSPVoxtlm: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation TasksIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPSpeech Collage: Code-Switched Audio Generation by Collaging Monolingual CorporaIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ST ICASSPEnhancing End-to-End Conversational Speech Translation Through Target Language Context UtilizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPPhisanet: Phonetically Informed Speech Animation NetworkIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SD&ASR ICASSPOne Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPLess Peaky and More Accurate CTC Forced Alignment by Pruned CTC Loss and Label PriorsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPHuberTopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR&ST&SLU ICASSPExploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative StudyIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- LLM&SLU ICASSPDynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for SpeechIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ST ICASSPCross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter SharingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPSemi-Autoregressive Streaming ASR with Label ContextIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPGenerative Context-Aware Fine-Tuning of Self-Supervised Speech ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPContextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam SearchIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPTrain Long and Test Long: Leveraging Full Document Contexts in Speech ProcessingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SE ICASSPImproving Design of Input Condition Invariant Speech EnhancementIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPPhoneme-Aware Encoding for Prefix-Tree-Based Contextual ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SS ICASSPBoosting Unknown-Number Speaker Separation with Transformer Decoder-Based AttractorIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- ASR ICASSPVisual Speech Recognition for Low-Resource Languages with Automatic Labels from Whisper ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Caption ICASSPTowards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal TokensIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPUnderstanding Probe Behaviors Through Variational Bounds of Mutual InformationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Caption ICASSPImproving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up AugmentationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- SSL ICASSPAV-Superb: A Multi-Task Evaluation Benchmark for Audio-Visual Representation ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
2023
- ASR ASRUEvaluating Self-supervised Speech Models on a Taiwanese Hokkien CorpusIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SVC ASRUThe Singing Voice Conversion Challenge 2023In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRUDomain Adaptation by Data Distribution Matching via Submodularity for Speech RecognitionIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- Summarization&ST ASRUSummarize while Translating: Universal Model with Parallel Decoding for Summarization and TranslationIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRUYODAS: Youtube-Oriented Dataset for Audio and SpeechIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SE&SS ASRUA Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and ExtractionIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR&SSL ASRUTorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorchIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SE ASRUToward Universal Speech Enhancement For Diverse Input ConditionsIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRUFindings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and BeyondIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SSL ASRUJoint Prediction and Denoising for Large-Scale Multilingual Self-Supervised LearningIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRUSegment-Level Vectorized Beam Search Based on Partially Autoregressive InferenceIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR&ST ASRUReproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- Summarization ASRUESPNet-SUMM: Introducing a novel large dataset, toolkit, and a cross-corpora evaluation of speech summarization systemsIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- ASR ASRULV-CTC: Non-autoregressive ASR with CTC and latent variable modelsIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2023
- SS NeurIPSUNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training MixturesIn Proceedings of the Conference on Neural Information Processing Systems 2023
- SS WASPAAExploring the Integration of Speech Separation and Recognition with Self-Supervised Learning RepresentationIn IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023
- SS CSLDilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training DataIEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
- SD TASLPOnline Neural Diarization of Unlimited Numbers of Speakers Using Global and Local AttractorsIEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
- MT&ASR TASLPLegoNN: Building Modular Encoder-Decoder ModelsIEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
- ST ACL(demo)ESPnet-ST-v2: Multipurpose Spoken Language Translation ToolkitIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
- ST ACLUnitY: Two-pass Direct Speech-to-speech Translation with Discrete UnitsIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2023
- ASR ICMLEfficient Sequence Transduction by Jointly Predicting Tokens and DurationsIn Proceedings of the International Conference on Machine Learning (ICML) 2023
- TTS IJCAILearning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text PretrainingIn IJCAI 2023
- TTS InterspeechDeep Speech Synthesis from MRI-Based Articulatory RepresentationsIn Proceedings of Interspeech 2023
- ASR InterspeechA New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task LearningIn Proceedings of Interspeech 2023
- ASR&SSL InterspeechExploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised LearningIn Proceedings of Interspeech 2023
- ASR InterspeechPrompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task GeneralizationIn Proceedings of Interspeech 2023
- ASR&SLU InterspeechIntegrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language UnderstandingIn Proceedings of Interspeech 2023
- ASR InterspeechIntegration of Frame- and Label-synchronous Beam Search for Streaming Encoder–decoder Speech RecognitionIn Proceedings of Interspeech 2023
- ASR InterspeechBayes Risk Transducer: Transducer with Controllable Alignment PredictionIn Proceedings of Interspeech 2023
- SSL InterspeechExploration on HuBERT with Multiple ResolutionIn Proceedings of Interspeech 2023
- ASR InterspeechTime-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block TrainingIn Proceedings of Interspeech 2023
- ASR&SSL InterspeechML-SUPERB: Multilingual Speech Universal PERformance BenchmarkIn Proceedings of Interspeech 2023
- SLU InterspeechTensor Decomposition for Minimization of E2E SLU Model Toward On-Device ProcessingIn Proceedings of Interspeech 2023
- ASR Interspeech4D: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decodersIn Proceedings of Interspeech 2023
- SSL InterspeechDPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech ModelsIn Proceedings of Interspeech 2023
- ASR&ST InterspeechlA Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding TasksIn Proceedings of Interspeech 2023
- SSL InterspeechReducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic ComputeIn Proceedings of Interspeech 2023
- Summarization InterspeechBASS: Block-wise Adaptation for Speech SummarizationIn Proceedings of Interspeech 2023
- ST EACLCTC Alignments Improve Autoregressive TranslationIn Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2023
- ASR ICLRContinuous Pseudo-Labeling from the StartIn Proceedings of the International Conference on Learning Representations (ICLR) 2023
- ASR ICASSPMulti-blank Transducers for Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SE ICASSPPAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech EnhancementIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SD ICASSPIn search of strong embedding extractors for speaker diarisationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPWav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo LanguagesIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPBECTRA: Transducer-based End-to-End ASR with BERT-Enhanced EncoderIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPInterMPL: Momentum Pseudo-Labeling with Intermediate CTC LossIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- TTS&SSL ICASSPA Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech UnitsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SE ICASSPTAPLoss: A Temporal Acoustic Parameter Loss for Speech EnhancementIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL&SLU ICASSPBridging Speech and Text Pre-trained Models with Unsupervised ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- Music ICASSPPHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution PredictorIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SLU ICASSPSpeech summarization of long spoken document: Improving memory efficiency of speech/text encodersIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL ICASSPContext-Aware Fine-Tuning of Self-Supervised Speech ModelsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- S2ST ICASSPEnhancing Speech-To-Speech Translation with Multiple TTS TargetsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPStreaming Joint Speech Recognition and Disfluency DetectionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPTowards Zero-Shot Code-Switched Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ST ICASSPAlign and Write and Re-order: Explainable End-to-End Speech Translation via Operation Sequence GenerationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPImproving Massively Multilingual ASR With Auxiliary CTC ObjectivesIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL ICASSPSpeechLMScore: Evaluating Speech Generation Using Speech Language ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- TTS ICASSPSpeaker-Independent Acoustic-to-Articulatory Speech InversionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SLU ICASSPJoint Modelling of Spoken Language Understanding Tasks with Integrated Dialog HistoryIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SS ICASSPTF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker SeparationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR&SSL ICASSPEURO: ESPnet Unsupervised ASR Open-Source ToolkitIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SE ICASSPNeural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band ModelingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR&SSL ICASSPAvoid Overthinking in Self-Supervised Models for Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- TTS ICASSPArticulatory Representation Learning Via Joint Factor Analysis and Neural Matrix FactorizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR&SLU&SSL ICASSPStructured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and UnderstandingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL ICASSPFully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic ModelIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- MultiModal ICASSPThe Multimodal Information Based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- SSL&ASR ICASSPFINDADAPTNET: Find and Insert Adapters by Learned Layer ImportanceIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
- ASR ICASSPI3D: Transformer architectures with input-dependent dynamic depth for speech recognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
2022
- TTS AAAIA Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous SpeechIn Proceedings of AAAI 2022
- ASR EMNLPBERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language ModelIn Proceedings of Findings of EMNLP 2022
- SLU EMNLPToken-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End ModelsIn Proceedings of Findings of EMNLP 2022
- SD TASLPOnline Neural Diarization of Unlimited Numbers of Speakers Using Global and Local AttractorsIn IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
- SE CSLA Dilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training DataIn Computer Speech & Language 2022
- SE TASLPEnd-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail PartyIn IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
- SE SPLImproving Frame-Online Neural Speech Enhancement with Overlapped-Frame PredictionIn IEEE Signal Processing Letters 2022
- SD TASLPEncoder-Decoder Based Attractors for End-to-End Neural DiarizationIn IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022
- ASR JSTSPSelf-Supervised Speech Representation Learning: A ReviewIn IEEE Journal of Selected Topics in Signal Processing 2022
- ST IWSLTFindings of the IWSLT 2022 Evaluation CampaignIn iwsltt 2022
- SD&SS SLTEEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of SpeakersIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR&SD&SLU&ER SLTSUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR SLTE-Branchformer: Branchformer with Enhanced merging for speech recognitionIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR&SLU SLTA Study on the Integration of Pre-Trained SSL and ASR and LM and SLU Models for Spoken Language UnderstandingIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR&SSL SLTOn Compressing Sequences for Self-Supervised Speech ModelsIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR&SE&SSL SLTEnd-to-End Integration of Speech Recognition and Dereverberation and Beamforming and Self-Supervised Learning RepresentationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- SE SLTMutual Learning of Single- and Multi-Channel End-to-End Neural DiarizationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR SLTEnd-to-End Multi-speaker ASR with Independent Vector AnalysisIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2022
- ASR InterspeechVQ-T: RNN Transducers using Vector-Quantized Prediction Network StatesIn Proceedings of Interspeech 2022
- ASR InterspeechMemory-Efficient Training of RNN-Transducer with Sampled SoftmaxIn Proceedings of Interspeech 2022
- SLU&ST InterspeechBlockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech TranslationIn Proceedings of Interspeech 2022
- Music InterspeechSingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training StrategyIn Proceedings of Interspeech 2022
- Music InterspeechMuskits: an End-to-end Music Processing Toolkit for Singing Voice SynthesisIn Proceedings of Interspeech 2022
- ASR InterspeechAudio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep AnalysisIn Proceedings of Interspeech 2022
- KWS InterspeechAudio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep AnalysisIn Proceedings of Interspeech 2022
- ASR InterspeechASR2K: Speech Recognition for Around 2000 Languages without AudioIn Proceedings of Interspeech 2022
- SE InterspeechESPnet-SE++: Speech Enhancement for Robust Speech Recognition and Translation and and UnderstandingIn Proceedings of Interspeech 2022
- SLU InterspeechTwo-Pass Low Latency End-to-End Spoken Language UnderstandingIn Proceedings of Interspeech 2022
- TTS InterspeechDeep Speech Synthesis from Articulatory RepresentationsIn Proceedings of Interspeech 2022
- ASR InterspeechMinimum latency training of sequence transducers for streaming end-to-end speech recognitionIn Proceedings of Interspeech 2022
- ASR InterspeechStreaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity DetectionIn Proceedings of Interspeech 2022
- ASR InterspeechBetter Intermediates Improve CTC InferenceIn Proceedings of Interspeech 2022
- ASR InterspeechUpdating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR ModelsIn Proceedings of Interspeech 2022
- ASR InterspeechAttention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASRIn Proceedings of Interspeech 2022
- ASR InterspeechResidual Language Model for End-to-end Speech RecognitionIn Proceedings of Interspeech 2022
- TTS InterspeechWhen Is TTS Augmentation Through a Pivot Language Useful?In Proceedings of Interspeech 2022
- TTS InterspeechTriniTTS: Pitch-controllable End-to-end TTS without External AlignerIn Proceedings of Interspeech 2022
- ASR InterspeechOnline Continual Learning of End-to-End Speech Recognition ModelsIn Proceedings of Interspeech 2022
- SE InterspeechImproving Speech Enhancement through Fine-Grained Speech CharacteristicsIn Proceedings of Interspeech 2022
- ASR&SE&SSL InterspeechEnd-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning RepresentationIn Proceedings of Interspeech 2022
- ASR&SSL InterspeechCombining Spectral and Self-Supervised Features for Low Resource Speech Recognition and TranslationIn Proceedings of Interspeech 2022
- ASR&SLU&MT ICMLBranchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and UnderstandingIn Proceedings of the International Conference on Machine Learning (ICML) 2022
- Linguistic ACLZero-shot Learning for Grapheme to Phoneme Conversion with Language EnsembleIn Proceedings of Findings of the Annual Meeting of the Association for Computational Linguistics 2022
- SE&VC&ST ACLSUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative CapabilitiesIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2022
- SE&ASR CSLDeep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognitionComputer Speech & Language 2022
- SD CSLA review of speaker diarization: Recent advances with deep learningComputer Speech & Language 2022
- SE&ASR CSLJoint speaker diarization and speech recognition based on region proposal networksComputer Speech & Language 2022
- ASR CSLArabic speech recognition by end-to-end, modular systems and humanComputer Speech & Language 2022
- ASR ICASSPTOWARDS LOW-DISTORTION MULTI-CHANNEL SPEECH ENHANCEMENT: THE ESPNET-SE SUBMISSION TO THE L3DAS22 CHALLENGEIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- Multimodal ICASSPTHE FIRST MULTIMODAL INFORMATION BASED SPEECH PROCESSING (MISP) CHALLENGE: DATA, TASKS, BASELINES AND RESULTSIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPNON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSINGIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPAN EXPLORATION OF HUBERT WITH LARGE NUMBER OF CLUSTER UNITS AND MODEL ASSESSMENT USING BAYESIAN INFORMATION CRITERIONIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SE&SSL ICASSPINVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATIONIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SE ICASSPCONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENTIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPIMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELSIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPSRU++: PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITIONIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPIntegrating multiple ASR systems into NLP backend with attention fusionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SLU ICASSPESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNETIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPJOINT MODELING OF CODE-SWITCHED AND MONOLINGUAL ASR VIA CONDITIONAL FACTORIZATIONIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPEXTENDED GRAPH TEMPORAL CLASSIFICATION FOR MULTI-SPEAKER END-TO-END ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPSequence Transduction with Graph-based SupervisionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPRUN-AND-BACK STITCH SEARCH: NOVEL BLOCK SYNCHRONOUS DECODING FOR STREAMING ENCODER-DECODER ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- VC&SSL ICASSPS3PRL-VC: OPEN-SOURCE VOICE CONVERSION FRAMEWORK WITH SELF-SUPERVISED SPEECH REPRESENTATIONSIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPJOINT SPEECH RECOGNITION AND AUDIO CAPTIONINGIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SD ICASSPMULTI-CHANNEL END-TO-END NEURAL DIARIZATION WITH DISTRIBUTED MICROPHONESIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- ASR ICASSPTORCHAUDIO: BUILDING BLOCKS FOR AUDIO AND SPEECH PROCESSINGIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SD ICASSPTowards End-to-End Speaker Diarization with Generalized Neural Speaker ClusteringIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- Music ICASSPTRAINING STRATEGIES FOR AUTOMATIC SONG WRITING: A UNIFIED FRAMEWORK PERSPECTIVEIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SE+ASR CSLAn investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducerComputer Speech & Language 2022
- SE ICASSPTowards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 ChallengeIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
2021
- ASR+TTS ASRUOn Prosody Modeling for ASR+TTS based Voice ConversionIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- SLU ASRUAttention-based Multi-hypothesis Fusion for Speech SummarizationIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ST ASRUFast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden IntermediatesIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- SD ASRUTowards Neural Diarization for Unlimited Numbers of Speakers using Global and Local AttractorsIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ASR ASRUA Study of Transducer based End-to-end ASR with ESPNet: Architecture, Auxiliary Loss and Decoding StrategiesIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ASR ASRUA Comparative Study on Non-autoregressive Modelings for Speech-to-text GenerationIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- SE ASRUConferencingSpeech Challenge: Towards Far-field Multi-channel Speech Enhancement for Video ConferencingIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ASR+TTS ASRUCross-lingual Transfer for Speech Processing using Acoustic Language SimilarityIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- ASR&SSL ASRUAn Exploration of Self-supervised Pretrained Representations for End-to-end Speech RecognitionIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2021
- VC APSIPAUnderstanding the Tradeoffs in Client-side Privacy for Downstream Speech TasksIn Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2021
- ST IWSLTESPnet-ST IWSLT 2021 Offline Speech Translation SystemIn Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT) 2021
- ASR InterspeechGigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed AudioIn Proceedings of Interspeech 2021
- AED InterspeechAcoustic Event Detection with Classifier ChainsIn Proceedings of Interspeech 2021
- ASR InterspeechMulti-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker ChainIn Proceedings of Interspeech 2021
- ASR InterspeechMulti-mode Transformer Transducer with Stochastic Future ContextIn Proceedings of Interspeech 2021
- ASR InterspeechDifferentiable Allophone Graphs for Language Universal Speech RecognitionIn Proceedings of Interspeech 2021
- SE InterspeechSpeaker Verification-Based Evaluation of Single-Channel Speech SeparationIn Proceedings of Interspeech 2021
- ASR InterspeechSPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognitionIn Proceedings of Interspeech 2021
- SSA InterspeechLeveraging Pre-trained Language Model for Speech Sentiment AnalysisIn Proceedings of Interspeech 2021
- ASR InterspeechStreaming End-to-End ASR based on Blockwise Non-Autoregressive ModelsIn Proceedings of Interspeech 2021
- SLU InterspeechRethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language UnderstandingIn Proceedings of Interspeech 2021
- ASR & SpeDialog InterspeechSpeech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021In Proceedings of Interspeech 2021
- ASR InterspeechLayer Pruning on Demand with Intermediate CTCIn Proceedings of Interspeech 2021
- ASR InterspeechToward Streaming ASR with Non-autoregressive Insertion-based ModelIn Proceedings of Interspeech 2021
- SE&ASR InterspeechAuxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristicsIn Proceedings of Interspeech 2021
- ASR InterspeechData Augmentation Methods for End-to-end Speech Recognition on Distant-talk ScenariosIn Proceedings of Interspeech 2021
- SD InterspeechTarget-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of SpeakerIn Proceedings of Interspeech 2021
- SE&ASR&ST DSLWThe 2020 ESPnet update: new features, broadened applications, performance improvements, and future plansIn Proceedings of 2021 IEEE Data Science and Learning Workshop 2021
- SE SLTDual-path RNN for long recording speech separationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SD SLTEnd-to-End Speaker Diarization Conditioned on Speech Activity and Overlap DetectionIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- ASR SLTStreaming Transformer ASR with blockwise synchronous beam searchIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SE SLTSequential multi-frame neural beamforming for speech separation and enhancementIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SD SLTDOVER-Lap: A Method for Combining Overlap-aware Diarization OutputsIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SE&SE&ASR SLTIntegration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysisIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- SD SLTOnline end-to-end neural diarization with speaker-tracing bufferIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
- ST AmericasNLPHighland Puebla Nahuatl Speech Translation Corpus for Endangered Language DocumentationIn Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
- ASR AmericasNLPEnd-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl MixtecIn Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
- ASR NAACLEnd-to-end ASR to jointly predict transcriptions and linguistic annotationsIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
- ST NAACLSearchable Hidden Intermediates for End-to-End Models of Decomposable Sequence TasksIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
- ST NAACLSource and Target Bidirectional Knowledge Distillation for End-to-end Speech TranslationIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021
- ASR EACLLeveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl MixtecIn Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics 2021
- SD InterspeechOnline Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of SpeakersIn Proceedings of Interspeech 2021
- SD InterspeechSemi-Supervised Training with Pseudo-Labeling for End-to-End Neural DiarizationIn Proceedings of Interspeech 2021
- SE InterspeechContinuous speech separation using speaker inventory for long recordingIn Proceedings of Interspeech 2021
- SD ICASSPEnd-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker EmbeddingsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE ICASSPDual-Path Modeling for Long Recording Speech Separation in MeetingsIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPRecent developments on espnet toolkit boosted by conformerIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE&ASR ICASSPEnd-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontendIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SD ICASSPEnd-to-end speaker diarization as post-processingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPImproved Mask-CTC for Non-Autoregressive End-to-End ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPIntermediate Loss Regularization for CTC-Based Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ST ICASSPOrthros: Non-autoregressive end-to-end speech translation with dual-decoderIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPDirectional ASR: A new paradigm for E2E multi-speaker speech recognition with source localizationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR&TTS&SSL ICASSPEat: Enhanced ASR-TTS for Self-Supervised Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- ASR ICASSPGaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech RecognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE&ASR ICASSPImproving RNN Transducer with Target Speaker Extraction and Neural Uncertainty EstimationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE ICASSPTraining Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small StepIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- Music ICASSPSequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy LossIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
- SE SLTESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR IntegrationIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2021
2020
- ST ACLESPnet-ST: All-in-One Speech Translation ToolkitIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2020
- SR&SSL NeurIPS
- SED DCASE
- ASR CHiME
- ASR ICASSP
- TTS ICASSPSemi-supervised speaker adaptation for end-to-end speech synthesis with pretrained modelsIn 2020
- ASR ICASSPEnd-to-end automatic speech recognition integrated with ctc-based voice activity detectionIn 2020
- ASR ICASSPA practical two-stage training strategy for multi-stream end-to-end speech recognitionIn 2020
- SE ICASSPFar-field location guided target speech extraction using end-to-end speech recognition objectivesIn 2020
- ASR Deep Neural Evolution
- ASR Interspeech
- SE Interspeech
- ASR Interspeech
2019
- ASR ASRUTransformer ASR with contextual block processingIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
- ASR ASRUMIMO-Speech: End-to-end multi-channel multi-speaker speech recognitionIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
- ST ASRUMultilingual end-to-end speech translationIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
- ASR+SD ASRUSimultaneous speech recognition and speaker diarization for monaural dialogue recordings with target-speaker acoustic modelsIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
- ASR ASRUEspresso: A fast end-to-end neural speech recognition toolkitIn IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
- ASR ASRUMulti-stream end-to-end speech recognitionIEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2019
- SS WASPAAAnalysis of robustness of deep single-channel speech separation using corpora constructed from multiple domainsIn IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
- ASR WASPAAGeneralized weighted-prediction-error dereverberation with varying source priors for reverberant speech recognitionIn IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
- ASR WASPAASpeech enhancement using end-to-end speech recognition objectivesIn IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
- ASR InterspeechEnd-to-End Multilingual Multi-Speaker Speech RecognitionIn Proceedings of Interspeech 2019
- ASR InterspeechPretraining by Backtranslation for End-to-end ASR in Low-Resource SettingsIn Proceedings of Interspeech 2019
- TTS InterspeechPre-trained Text Embeddings for Enhanced Text-to-Speech SynthesisIn Proceedings of Interspeech 2019
- ASR InterspeechAnalysis of Multilingual Sequence-to-Sequence speech recognition systemsIn Proceedings of Interspeech 2019
- ASR InterspeechEnd-to-end SpeakerBeam for single channel target speech recognitionIn Proceedings of Interspeech 2019
- ASR InterspeechSemi-supervised Sequence-to-sequence ASR using Unpaired Speech and TextIn Proceedings of Interspeech 2019
- ASR InterspeechStudy of the performance of automatic speech recognition systems in speakers with Parkinson’s DiseaseIn Proceedings of Interspeech 2019
- ASR InterspeechVectorized Beam Search for CTC-Attention-based Speech RecognitionIn Proceedings of Interspeech 2019
- ASR InterspeechSpeaker recognition benchmark using the CHiME-5 corpusIn Proceedings of Interspeech 2019
- ASR InterspeechInterference Speaker Loss for Target-Speaker Speech RecognitionIn Proceedings of Interspeech 2019
- ASR EUSIPCOCNN-based multichannel end-to-end speech recognition for everyday home environmentsIn 2019 27th European Signal Processing Conference (EUSIPCO) 2019
- OCR ICDARUsing ASR methods for OCRIn 2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
- Music IJCNNWeakly-supervised deep recurrent neural networks for basic dance step generationIn 2019 International Joint Conference on Neural Networks (IJCNN) 2019
- ASR NAACLMassively Multilingual Adversarial Speech RecognitionIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2019
- ASR ICASSPPromising accurate prefix boosting for sequence-to-sequence ASRIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- ASR ICASSPTransfer learning of language-independent end-to-end asr with language model fusionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- ASR ICASSPImproving end-to-end speech recognition with pronunciation-assisted sub-word modelingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- ASR ICASSPLanguage model integration based on memory control for sequence to sequence speech recognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- ASR ICASSPStream attention-based multi-array end-to-end speech recognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- ASR ICASSPAcoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge systemIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- ASR ICASSPCycle-consistency training for end-to-end speech recognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- AED ICASSPJoint acoustic and class inference for weakly supervised sound event detectionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- SE ICASSPThe phasebook: Building complex masks via discrete representations for source separationIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- ASR ICASSPEnd-to-end monaural multi-speaker ASR system without pretrainingIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- ASR ICASSPSemi-supervised end-to-end speech recognition using text-to-speech and autoencodersIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
- ASR ICASSPAcoustic modeling for distant multi-talker speech recognition with single-and multi-channel branchesIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
2018
- ML PhysicaModel parameter learning using Kullback–Leibler divergencePhysica A: Statistical Mechanics and its Applications 2018
- ASR SLTEnd-to-end speech recognition with word-based RNN language modelsIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
- ASR SLTLow-resource contextual topic identification on speechIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
- ASR SLTBack-translation-style data augmentation for end-to-end ASRIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
- ASR SLTMultilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modelingIn Proceedings of IEEE Spoken Language Technology Workshop (SLT) 2018
- ASR InterspeechBuilding State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement BaselineProceedings of Interspeech 2018
- ASR InterspeechMulti-Head Decoder for End-to-End Speech RecognitionProceedings of Interspeech 2018
- ASR InterspeechSemi-Supervised End-to-End Speech RecognitionProceedings of Interspeech 2018
- SE InterspeechStudent-Teacher Learning for BLSTM Mask-based Speech EnhancementProceedings of Interspeech 2018
- ASR InterspeechMulti-Modal Data Augmentation for End-to-end ASRProceedings of Interspeech 2018
- LID InterspeechEffectiveness of single-channel blstm enhancement for language identificationIn Interspeech 2018 2018
- ASR InterspeechAuxiliary Feature Based Adaptation of End-to-end ASR SystemsProceedings of Interspeech 2018
- ASR ACLA Purely End-to-End System for Multi-speaker Speech RecognitionIn Proceedings of the Annual Meeting of the Association for Computational Linguistics 2018
- ASR ICASSPSpeaker adaptation for multichannel end-to-end speech recognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
- ASR ICASSPAn end-to-end language-tracking speech recognizer for mixed-language speechIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
- ASR ICASSPEnd-to-end multi-speaker speech recognitionIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018