| Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens Z Du, Q Chen, S Zhang, K Hu, H Lu, Y Yang, H Hu, S Zheng, Y Gu, Z Ma, ... arXiv preprint arXiv:2407.05407, 2024 | 321 | 2024 |
| Cosyvoice 2: Scalable streaming speech synthesis with large language models Z Du, Y Wang, Q Chen, X Shi, X Lv, T Zhao, Z Gao, Y Yang, C Gao, ... arXiv preprint arXiv:2412.10117, 2024 | 267 | 2024 |
| Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms K An, Q Chen, C Deng, Z Du, C Gao, Z Gao, Y Gu, T He, H Hu, K Hu, S Ji, ... arXiv preprint arXiv:2407.04051, 2024 | 71 | 2024 |
| Data augmentation using deep generative models for embedding based speaker recognition S Wang, Y Yang, Z Wu, Y Qian, K Yu IEEE/ACM Transactions on Audio, Speech, and Language Processing 28, 2598-2609, 2020 | 67 | 2020 |
| The SJTU robust anti-spoofing system for the ASVspoof 2019 challenge. Y Yang, H Wang, H Dinkel, Z Chen, S Wang, Y Qian, K Yu Interspeech, 1038-1042, 2019 | 64 | 2019 |
| Revisiting the statistics pooling layer in deep speaker embedding learning S Wang, Y Yang, Y Qian, K Yu 2021 12th International Symposium on Chinese Spoken Language Processing …, 2021 | 52 | 2021 |
| Knowledge distillation for small foot-print deep speaker embedding S Wang, Y Yang, T Wang, Y Qian, K Yu ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 45 | 2019 |
| Aispeech-sjtu accent identification system for the accented english speech recognition challenge H Huang, X Xiang, Y Yang, R Ma, Y Qian ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 40 | 2021 |
| Minmo: A multimodal large language model for seamless voice interaction Q Chen, Y Chen, Y Chen, M Chen, Y Chen, C Deng, Z Du, R Gao, C Gao, ... arXiv preprint arXiv:2501.06282, 2025 | 24 | 2025 |
| SeACo-Paraformer: A non-autoregressive ASR system with flexible and effective hotword customization ability X Shi, Y Yang, Z Li, Y Chen, Z Gao, S Zhang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 24 | 2024 |
| Generative adversarial networks based x-vector augmentation for robust probabilistic linear discriminant analysis in speaker verification Y Yang, S Wang, M Sun, Y Qian, K Yu 2018 11th International Symposium on Chinese Spoken Language Processing …, 2018 | 19 | 2018 |
| Text adaptation for speaker verification with speaker-text factorized embeddings Y Yang, S Wang, X Gong, Y Qian, K Yu ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 14 | 2020 |
| Speaker embedding augmentation with noise distribution matching X Gong, Z Chen, Y Yang, S Wang, L Wang, Y Qian 2021 12th International Symposium on Chinese Spoken Language Processing …, 2021 | 5 | 2021 |
| SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer Z Sheng, Z Du, S Zhang, Z Yan, Y Yang, Z Ling arXiv preprint arXiv:2502.11094, 2025 | 1 | 2025 |