[go: up one dir, main page]

Follow
Guangzhi Sun
Title
Cited by
Cited by
Year
Salmonn: Towards generic hearing abilities for large language models
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
arXiv preprint arXiv:2310.13289, 2023
6232023
Large language models surpass human experts in predicting neuroscience results
X Luo, A Rechardt, G Sun, KK Nejad, F Yáñez, B Yilmaz, K Lee, ...
Nature human behaviour 9 (2), 305-315, 2025
2092025
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
G Sun, Y Zhang, RJ Weiss, Y Cao, H Zen, Y Wu
ICASSP 2020-2020 IEEE international conference on acoustics, speech and …, 2020
1602020
Generating diverse and natural text-to-speech samples using a quantized fine-grained vae and autoregressive prosody prior
G Sun, Y Zhang, RJ Weiss, Y Cao, H Zen, A Rosenberg, B Ramabhadran, ...
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
130*2020
Connecting speech encoder and large language model for asr
W Yu, C Tang, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
1042024
video-salmonn: Speech-enhanced audio-visual large language models
G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, Y Wang, C Zhang
arXiv preprint arXiv:2406.15704, 2024
92*2024
Building better ai agents: A provocation on the utilisation of persona in llm-based conversational agents
G Sun, X Zhan, J Such
Proceedings of the 6th ACM Conference on Conversational User Interfaces, 1-6, 2024
762024
Transformer language models with LSTM-based cross-utterance information representation
G Sun, C Zhang, PC Woodland
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
482021
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
J Hwang, M Hira, C Chen, X Zhang, Z Ni, G Sun, P Ma, R Huang, V Pratap, ...
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-9, 2023
442023
Speaker diarisation using 2D self-attentive combination of embeddings
G Sun, C Zhang, PC Woodland
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
402019
Tree-constrained pointer generator for end-to-end contextual speech recognition
G Sun, C Zhang, PC Woodland
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2021
372021
Salmonn-omni: A codec-free llm for full-duplex speech understanding and generation
W Yu, S Wang, X Yang, X Chen, X Tian, J Zhang, G Sun, L Lu, Y Wang, ...
arXiv preprint arXiv:2411.18138, 2024
36*2024
Can contextual biasing remain effective with Whisper and GPT-2?
G Sun, X Zheng, C Zhang, PC Woodland
arXiv preprint arXiv:2306.01942, 2023
282023
Extending large language models for speech and audio captioning
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
262024
Affect recognition in conversations using large language models
S Feng, G Sun, N Lubis, W Wu, C Zhang, M Gasic
Proceedings of the 25th Annual Meeting of the Special Interest Group on …, 2024
242024
Combination of deep speaker embeddings for diarisation
G Sun, C Zhang, PC Woodland
Neural Networks 141, 372-384, 2021
242021
Enabling auditory large language models for automatic speech quality evaluation
S Wang, W Yu, Y Yang, C Tang, Y Li, J Zhuang, X Chen, X Tian, J Zhang, ...
ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and …, 2025
202025
Enhancing multimodal LLM for detailed and accurate video captioning using multi-round preference optimization
C Tang, Y Li, Y Yang, J Zhuang, G Sun, W Li, Z Ma, C Zhang
arXiv preprint arXiv:2410.06682, 2024
20*2024
Can large language models understand spatial audio?
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, J Zhang, L Lu, Z Ma, Y Wang, ...
arXiv preprint arXiv:2406.07914, 2024
182024
Parameter efficient finetuning for speech emotion recognition and domain adaptation
N Lashkarashvili, W Wu, G Sun, PC Woodland
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
172024
The system can't perform the operation now. Try again later.
Articles 1–20