| Deep Voice 3: Scaling text-to-speech with convolutional sequence learning W Ping, K Peng, A Gibiansky, SO Arik, A Kannan, S Narang, J Raiman, ... ICLR 2018, 2018 | 1011* | 2018 |
| Deep voice 2: Multi-speaker neural text-to-speech A Gibiansky, S Arik, G Diamos, J Miller, K Peng, W Ping, J Raiman, ... NIPS 2017, 2962-2970, 2017 | 711* | 2017 |
| Neural voice cloning with a few samples S Arik, J Chen, K Peng, W Ping, Y Zhou NeurIPS 2018, 10019-10029, 2018 | 567 | 2018 |
| ClariNet: Parallel wave generation in end-to-end text-to-speech W Ping, K Peng, J Chen ICLR 2019, 2018 | 475 | 2018 |
| Non-Autoregressive Neural Text-to-Speech K Peng, W Ping, Z Song, K Zhao ICML 2020, 2019 | 203* | 2019 |
| WaveFlow: A Compact Flow-based Model for Raw Audio W Ping, K Peng, K Zhao, Z Song ICML 2020, 2019 | 182 | 2019 |
| Systems and methods for multi-speaker neural text-to-speech G DIAMOS, A GIBIANSKY, J Miller, K PENG, W PING, J RAIMAN, Y ZHOU US Patent 10,896,669, 2021 | 119 | 2021 |
| Systems and methods for neural voice cloning with a few samples C Jitong, P Kainan, P Wei, Z Yanqi US Patent 11,238,843, 2022 | 73 | 2022 |
| Systems and methods for neural text-to-speech using convolutional sequence learning P Wei, P Kainan US Patent 10,796,686, 2020 | 56 | 2020 |
| Incremental text-to-speech synthesis with prefix-to-prefix framework M Ma, B Zheng, K Liu, R Zheng, H Liu, K Peng, K Church, L Huang Findings of the Association for Computational Linguistics: EMNLP 2020, 3886-3896, 2020 | 39 | 2020 |
| Vevo: Controllable zero-shot voice imitation with self-supervised disentanglement X Zhang, X Zhang, K Peng, Z Tang, V Manohar, Y Liu, J Hwang, D Li, ... arXiv preprint arXiv:2502.07243, 2025 | 35 | 2025 |
| Parallel neural text-to-speech P Kainan, P Wei, S Zhao, Z Kexin US Patent 11,017,761, 2021 | 32 | 2021 |
| Systems and methods for parallel wave generation in end-to-end text-to-speech P Wei, P Kainan, C Jitong US Patent 10,872,596, 2020 | 27 | 2020 |
| Multi-speaker end-to-end speech synthesis J Park, K Zhao, K Peng, W Ping arXiv preprint arXiv:1907.04462, 2019 | 22 | 2019 |
| Voiceshop: A unified speech-to-speech framework for identity-preserving zero-shot voice editing P Anastassiou, Z Tang, K Peng, D Jia, J Li, M Tu, Y Wang, Y Wang, M Ma arXiv preprint arXiv:2404.06674, 2024 | 11 | 2024 |
| Zero-shot accent conversion using pseudo siamese disentanglement network D Jia, Q Tian, K Peng, J Li, Y Chen, M Ma, Y Wang, Y Wang arXiv preprint arXiv:2212.05751, 2022 | 7 | 2022 |
| Deep Voice 3: scaling text-to-speech with convolutional sequence learning P Wei, P Kainan, G Andrew, SO Arik, A Kannan, S Narang, J Raiman, ... arXiv preprint, 2017 | 5 | 2017 |
| Waveform generation using end-to-end text-to-waveform system P Wei, P Kainan, C Jitong US Patent 11,482,207, 2022 | 3 | 2022 |
| Multi-speaker neural text-to-speech G DIAMOS, A GIBIANSKY, J Miller, K PENG, W PING, J RAIMAN, Y ZHOU US Patent 11,651,763, 2023 | 2 | 2023 |
| SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment S Mehta, Y Liu, Z Tang, K Peng, V Manohar, S Zhang, M Seltzer, Q He, ... arXiv preprint arXiv:2507.09070, 2025 | 1 | 2025 |