| Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection R Tao, Z Pan, RK Das, X Qian, MZ Shou, H Li Proceedings of the 29th ACM international conference on multimedia, 3927-3935, 2021 | 254 | 2021 |
| Multi-modal Attention for Speech Emotion Recognition Z Pan, Z Luo, J Yang, H Li Proc. Interspeech 2020, 364--368, 2020 | 122 | 2020 |
| Muse: Multi-modal target speaker extraction with visual cues Z Pan, R Tao, C Xu, H Li ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 80 | 2021 |
| USEV: Universal speaker extraction with visual cue Z Pan, M Ge, H Li IEEE/ACM Transactions on Audio, Speech and Language Processing 30, 3032 - 3045, 2022 | 66 | 2022 |
| Selective listening by synchronizing speech with lips Z Pan, R Tao, C Xu, H Li IEEE/ACM Transactions on Audio, Speech and Language Processing 30, 1650 - 1664, 2022 | 62 | 2022 |
| Multi-target DoA estimation with an audio-visual fusion mechanism X Qian, M Madhavi, Z Pan, J Wang, H Li ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 58 | 2021 |
| Speaker Extraction with Co-Speech Gestures Cue Z Pan, X Qian, H Li IEEE Signal Processing Letters 29, 1467 - 1471, 2022 | 35 | 2022 |
| NeuroHeed: Neuro-steered speaker extraction using EEG signals Z Pan, M Borsdorf, S Cai, T Schultz, H Li IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024 | 34 | 2024 |
| TF-Locoformer: Transformer with local modeling by convolution for speech separation and enhancement K Saijo, G Wichern, FG Germain, Z Pan, J Le Roux 2024 18th International Workshop on Acoustic Signal Enhancement (IWAENC …, 2024 | 34 | 2024 |
| A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction Z Pan, M Ge, H Li Proc. Interspeech 2022, 2022 | 29 | 2022 |
| Target active speaker detection with audio-visual cues Y Jiang, R Tao, Z Pan, H Li arXiv preprint arXiv:2305.12831, 2023 | 26 | 2023 |
| NIIRF: Neural IIR filter field for HRTF upsampling and personalization Y Masuyama, G Wichern, FG Germain, Z Pan, S Khurana, C Hori, ... ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 25 | 2024 |
| Restoring speaking lips from occlusion for audio-visual speech recognition J Wang, Z Pan, M Zhang, RT Tan, H Li Proceedings of the AAAI conference on artificial intelligence 38 (17), 19144 …, 2024 | 22 | 2024 |
| Scenario-aware audio-visual TF-Gridnet for target speech extraction Z Pan, G Wichern, Y Masuyama, FG Germain, S Khurana, C Hori, ... 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-8, 2023 | 22 | 2023 |
| Time-domain speech separation networks with graph encoding auxiliary T Wang, Z Pan, M Ge, Z Yang, H Li IEEE Signal Processing Letters 30, 110-114, 2023 | 21 | 2023 |
| NeuroHeed+: Improving neuro-steered speaker extraction with joint auditory attention detection Z Pan, G Wichern, FG Germain, S Khurana, J Le Roux ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 18 | 2024 |
| Generation or replication: Auscultating audio latent diffusion models D Bralios, G Wichern, FG Germain, Z Pan, S Khurana, C Hori, J Le Roux ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 17 | 2024 |
| ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting Z Pan, W Wang, M Borsdorf, H Li ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2022 | 16 | 2022 |
| Emotional dimension control in language model-based text-to-speech: Spanning a broad spectrum of human emotions K Zhou, Y Zhang, S Zhao, H Wang, Z Pan, D Ng, C Zhang, C Ni, Y Ma, ... arXiv preprint arXiv:2409.16681, 2024 | 14 | 2024 |
| Speech separation with pretrained frontend to minimize domain mismatch W Wang, Z Pan, X Li, S Wang, H Li IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024 | 13 | 2024 |