Kainan Peng

Cited by

	All	Since 2021
Citations	3582	2581
h-index	14	13
i10-index	15	15

580

290

145

435

201720182019202020212022202320242025202612 136 366 444 563 571 548 468 420 11

Co-authors

Wei PingDistinguished Research Scientist, NVIDIAVerified email at nvidia.com
Sercan O. ArikGoogleVerified email at google.com
Yanqi ZhouGoogle DeepmindVerified email at google.com
Gregory DiamosRelational AIVerified email at relational.ai
Jitong ChenByteDanceVerified email at cse.ohio-state.edu
Sharan NarangDirector, AI Research, MetaVerified email at meta.com
Ajay KannanGoogleVerified email at google.com

Kainan Peng

Amazon

Verified email at alumni.cmu.edu

Text-to-Speech Computer Engineering Machine Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Deep Voice 3: Scaling text-to-speech with convolutional sequence learning W Ping, K Peng, A Gibiansky, SO Arik, A Kannan, S Narang, J Raiman, ... ICLR 2018, 2018	1011*	2018
Deep voice 2: Multi-speaker neural text-to-speech A Gibiansky, S Arik, G Diamos, J Miller, K Peng, W Ping, J Raiman, ... NIPS 2017, 2962-2970, 2017	711*	2017
Neural voice cloning with a few samples S Arik, J Chen, K Peng, W Ping, Y Zhou NeurIPS 2018, 10019-10029, 2018	567	2018
ClariNet: Parallel wave generation in end-to-end text-to-speech W Ping, K Peng, J Chen ICLR 2019, 2018	475	2018
Non-Autoregressive Neural Text-to-Speech K Peng, W Ping, Z Song, K Zhao ICML 2020, 2019	203*	2019
WaveFlow: A Compact Flow-based Model for Raw Audio W Ping, K Peng, K Zhao, Z Song ICML 2020, 2019	182	2019
Systems and methods for multi-speaker neural text-to-speech G DIAMOS, A GIBIANSKY, J Miller, K PENG, W PING, J RAIMAN, Y ZHOU US Patent 10,896,669, 2021	119	2021
Systems and methods for neural voice cloning with a few samples C Jitong, P Kainan, P Wei, Z Yanqi US Patent 11,238,843, 2022	73	2022
Systems and methods for neural text-to-speech using convolutional sequence learning P Wei, P Kainan US Patent 10,796,686, 2020	56	2020
Incremental text-to-speech synthesis with prefix-to-prefix framework M Ma, B Zheng, K Liu, R Zheng, H Liu, K Peng, K Church, L Huang Findings of the Association for Computational Linguistics: EMNLP 2020, 3886-3896, 2020	39	2020
Vevo: Controllable zero-shot voice imitation with self-supervised disentanglement X Zhang, X Zhang, K Peng, Z Tang, V Manohar, Y Liu, J Hwang, D Li, ... arXiv preprint arXiv:2502.07243, 2025	35	2025
Parallel neural text-to-speech P Kainan, P Wei, S Zhao, Z Kexin US Patent 11,017,761, 2021	32	2021
Systems and methods for parallel wave generation in end-to-end text-to-speech P Wei, P Kainan, C Jitong US Patent 10,872,596, 2020	27	2020
Multi-speaker end-to-end speech synthesis J Park, K Zhao, K Peng, W Ping arXiv preprint arXiv:1907.04462, 2019	22	2019
Voiceshop: A unified speech-to-speech framework for identity-preserving zero-shot voice editing P Anastassiou, Z Tang, K Peng, D Jia, J Li, M Tu, Y Wang, Y Wang, M Ma arXiv preprint arXiv:2404.06674, 2024	11	2024
Zero-shot accent conversion using pseudo siamese disentanglement network D Jia, Q Tian, K Peng, J Li, Y Chen, M Ma, Y Wang, Y Wang arXiv preprint arXiv:2212.05751, 2022	7	2022
Deep Voice 3: scaling text-to-speech with convolutional sequence learning P Wei, P Kainan, G Andrew, SO Arik, A Kannan, S Narang, J Raiman, ... arXiv preprint, 2017	5	2017
Waveform generation using end-to-end text-to-waveform system P Wei, P Kainan, C Jitong US Patent 11,482,207, 2022	3	2022
Multi-speaker neural text-to-speech G DIAMOS, A GIBIANSKY, J Miller, K PENG, W PING, J RAIMAN, Y ZHOU US Patent 11,651,763, 2023	2	2023
SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment S Mehta, Y Liu, Z Tang, K Peng, V Manohar, S Zhang, M Seltzer, Q He, ... arXiv preprint arXiv:2507.09070, 2025	1	2025

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors