| Salmonn: Towards generic hearing abilities for large language models C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang arXiv preprint arXiv:2310.13289, 2023 | 623 | 2023 |
| Large language models surpass human experts in predicting neuroscience results X Luo, A Rechardt, G Sun, KK Nejad, F Yáñez, B Yilmaz, K Lee, ... Nature human behaviour 9 (2), 305-315, 2025 | 209 | 2025 |
| Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis G Sun, Y Zhang, RJ Weiss, Y Cao, H Zen, Y Wu ICASSP 2020-2020 IEEE international conference on acoustics, speech and …, 2020 | 160 | 2020 |
| Generating diverse and natural text-to-speech samples using a quantized fine-grained vae and autoregressive prosody prior G Sun, Y Zhang, RJ Weiss, Y Cao, H Zen, A Rosenberg, B Ramabhadran, ... ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 130* | 2020 |
| Connecting speech encoder and large language model for asr W Yu, C Tang, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 104 | 2024 |
| video-salmonn: Speech-enhanced audio-visual large language models G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, Y Wang, C Zhang arXiv preprint arXiv:2406.15704, 2024 | 92* | 2024 |
| Building better ai agents: A provocation on the utilisation of persona in llm-based conversational agents G Sun, X Zhan, J Such Proceedings of the 6th ACM Conference on Conversational User Interfaces, 1-6, 2024 | 76 | 2024 |
| Transformer language models with LSTM-based cross-utterance information representation G Sun, C Zhang, PC Woodland ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 48 | 2021 |
| TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch J Hwang, M Hira, C Chen, X Zhang, Z Ni, G Sun, P Ma, R Huang, V Pratap, ... 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-9, 2023 | 44 | 2023 |
| Speaker diarisation using 2D self-attentive combination of embeddings G Sun, C Zhang, PC Woodland ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 40 | 2019 |
| Tree-constrained pointer generator for end-to-end contextual speech recognition G Sun, C Zhang, PC Woodland 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2021 | 37 | 2021 |
| Salmonn-omni: A codec-free llm for full-duplex speech understanding and generation W Yu, S Wang, X Yang, X Chen, X Tian, J Zhang, G Sun, L Lu, Y Wang, ... arXiv preprint arXiv:2411.18138, 2024 | 36* | 2024 |
| Can contextual biasing remain effective with Whisper and GPT-2? G Sun, X Zheng, C Zhang, PC Woodland arXiv preprint arXiv:2306.01942, 2023 | 28 | 2023 |
| Extending large language models for speech and audio captioning C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 26 | 2024 |
| Affect recognition in conversations using large language models S Feng, G Sun, N Lubis, W Wu, C Zhang, M Gasic Proceedings of the 25th Annual Meeting of the Special Interest Group on …, 2024 | 24 | 2024 |
| Combination of deep speaker embeddings for diarisation G Sun, C Zhang, PC Woodland Neural Networks 141, 372-384, 2021 | 24 | 2021 |
| Enabling auditory large language models for automatic speech quality evaluation S Wang, W Yu, Y Yang, C Tang, Y Li, J Zhuang, X Chen, X Tian, J Zhang, ... ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and …, 2025 | 20 | 2025 |
| Enhancing multimodal LLM for detailed and accurate video captioning using multi-round preference optimization C Tang, Y Li, Y Yang, J Zhuang, G Sun, W Li, Z Ma, C Zhang arXiv preprint arXiv:2410.06682, 2024 | 20* | 2024 |
| Can large language models understand spatial audio? C Tang, W Yu, G Sun, X Chen, T Tan, W Li, J Zhang, L Lu, Z Ma, Y Wang, ... arXiv preprint arXiv:2406.07914, 2024 | 18 | 2024 |
| Parameter efficient finetuning for speech emotion recognition and domain adaptation N Lashkarashvili, W Wu, G Sun, PC Woodland ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 17 | 2024 |