2023 Reading Group
2023.1.17 NeurIPS 2022 Paper List
- HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis
- u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
- BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
- Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
- INRAS: Implicit Neural Representation for Audio Scenes
- Few-Shot Audio-Visual Learning of Environment Acoustics
2023.1.24 SLT 2022 Paper List
- JOIST: A JOINT SPEECH AND TEXT STREAMING MODEL FOR ASR
- Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
- MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
- Dual Learning for Large Vocabulary On-Device ASR
- HMM VS. CTC FOR AUTOMATIC SPEECH RECOGNITION: COMPARISON BASED ON FULL-SUM TRAINING FROM SCRATCH
2023.3.14 SLT 2022 Paper List
- An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
- G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR
- CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations
- On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding
2023.3.28 EMNLP 2022 Paper List
- Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition
- Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing
- SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
2023.10.3 ACL 2023 Paper List
- SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
- Efficient Transformers with Dynamic Token Pooling
- A Simple Concatenation can Effectively Improve Speech Translation
- CTC-based Non-autoregressive Speech Translation
2023.10.10 ACL 2023 Paper List
- When Does Translation Require Context? A Data-driven, Multilingual Exploration
- Introducing Semantics into Speech Encoders
- Pre-Training to Learn in Context
- Learning Language-Specific Layers for Multilingual Machine Translation
- Finding the Pillars of Strength for Multi-Head Attention
2023.11.7 WASPAA 2023 Paper List
- Differentiable Representation of Warping based on Lie Group Theory
- A Differentiable Image Source Model for Room Acoustics Optimization
- Yet Another Generative Model For Room Impulse Response Estimation
2023.11.14 WASPAA 2023 Paper List
- Low-Complexity Higher Order Scattering Delay Networks
- All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio
- Diffusion Posterior Sampling for Informed Single-Channel Dereverberation