[go: up one dir, main page]

Follow
Yexin Yang
Title
Cited by
Cited by
Year
Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens
Z Du, Q Chen, S Zhang, K Hu, H Lu, Y Yang, H Hu, S Zheng, Y Gu, Z Ma, ...
arXiv preprint arXiv:2407.05407, 2024
3212024
Cosyvoice 2: Scalable streaming speech synthesis with large language models
Z Du, Y Wang, Q Chen, X Shi, X Lv, T Zhao, Z Gao, Y Yang, C Gao, ...
arXiv preprint arXiv:2412.10117, 2024
2672024
Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms
K An, Q Chen, C Deng, Z Du, C Gao, Z Gao, Y Gu, T He, H Hu, K Hu, S Ji, ...
arXiv preprint arXiv:2407.04051, 2024
712024
Data augmentation using deep generative models for embedding based speaker recognition
S Wang, Y Yang, Z Wu, Y Qian, K Yu
IEEE/ACM Transactions on Audio, Speech, and Language Processing 28, 2598-2609, 2020
672020
The SJTU robust anti-spoofing system for the ASVspoof 2019 challenge.
Y Yang, H Wang, H Dinkel, Z Chen, S Wang, Y Qian, K Yu
Interspeech, 1038-1042, 2019
642019
Revisiting the statistics pooling layer in deep speaker embedding learning
S Wang, Y Yang, Y Qian, K Yu
2021 12th International Symposium on Chinese Spoken Language Processing …, 2021
522021
Knowledge distillation for small foot-print deep speaker embedding
S Wang, Y Yang, T Wang, Y Qian, K Yu
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
452019
Aispeech-sjtu accent identification system for the accented english speech recognition challenge
H Huang, X Xiang, Y Yang, R Ma, Y Qian
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
402021
Minmo: A multimodal large language model for seamless voice interaction
Q Chen, Y Chen, Y Chen, M Chen, Y Chen, C Deng, Z Du, R Gao, C Gao, ...
arXiv preprint arXiv:2501.06282, 2025
242025
SeACo-Paraformer: A non-autoregressive ASR system with flexible and effective hotword customization ability
X Shi, Y Yang, Z Li, Y Chen, Z Gao, S Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
242024
Generative adversarial networks based x-vector augmentation for robust probabilistic linear discriminant analysis in speaker verification
Y Yang, S Wang, M Sun, Y Qian, K Yu
2018 11th International Symposium on Chinese Spoken Language Processing …, 2018
192018
Text adaptation for speaker verification with speaker-text factorized embeddings
Y Yang, S Wang, X Gong, Y Qian, K Yu
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
142020
Speaker embedding augmentation with noise distribution matching
X Gong, Z Chen, Y Yang, S Wang, L Wang, Y Qian
2021 12th International Symposium on Chinese Spoken Language Processing …, 2021
52021
SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer
Z Sheng, Z Du, S Zhang, Z Yan, Y Yang, Z Ling
arXiv preprint arXiv:2502.11094, 2025
12025
The system can't perform the operation now. Try again later.
Articles 1–14