Zhifeng Kong

Cited by

	All	Since 2021
Citations	3408	3382
h-index	15	15
i10-index	18	18

1400

700

350

1050

202020212022202320242025202615 109 252 707 971 1302 38

Public access

View all

7 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Bryan CatanzaroNVIDIAVerified email at acm.org
Wei PingDistinguished Research Scientist, NVIDIAVerified email at nvidia.com
Rafael ValleMETA, NVIDIA, UC Berkeley, CNMATVerified email at berkeley.edu
Arushi GoelResearch Scientist, NVIDIAVerified email at sms.ed.ac.uk
Jiaji HuangDuke University, Baidu Research, Amazon AWSVerified email at amazon.com
Kexin ZhaoSnapVerified email at snapchat.com
Kamalika ChaudhuriFAIR @ MetaVerified email at ucsd.edu
Sang-gil LeeNVIDIAVerified email at nvidia.com
Dahua LinThe Chinese University of Hong KongVerified email at ie.cuhk.edu.hk
Zhaoyang LyuPhD of Information Engineering, The Chinese University of Hong KongVerified email at link.cuhk.edu.hk
Rohan BadlaniComputer Science, Stanford University, BITS PilaniVerified email at cs.stanford.edu
Xudong XUResearcher, Shanghai AI LaboratoryVerified email at pjlab.org.cn
Liang PanShanghai AI LabVerified email at pjlab.org.cn
Ambuj MehrishResearch Fellow, Singapore University of Technology and Design, SingaporeVerified email at sutd.edu.sg
Soujanya PoriaAssociate Professor, Electrical and Electronic Engineering, Nanyang Technological UniversityVerified email at ntu.edu.sg
Navonil MajumderSingapore University of Technology and DesignVerified email at sutd.edu.sg
Amrita Roy ChowdhuryUniversity of Michigan, Ann ArborVerified email at umich.edu
Ching-Yun KoMassachusetts Institute of TechnologyVerified email at mit.edu
Jinjun WangXian Jiaotong University, ChinaVerified email at ieee.org
Deepanway GhosalGoogle DeepMindVerified email at google.com

Zhifeng Kong

Senior Research Scientist, NVIDIA

Verified email at ucsd.edu - Homepage

Deep Generative Models Diffusion Models Audio Foundation Models Audio LM Trustworthy ML


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Diffwave: A versatile diffusion model for audio synthesis Z Kong, W Ping, J Huang, K Zhao, B Catanzaro ICLR 2021 (oral), 2021	2108	2021
On fast sampling of diffusion probabilistic models Z Kong, W Ping ICML 2021 Workshop on Invertible Neural Networks, Normalizing Flows, and …, 2021	244	2021
A conditional point diffusion-refinement paradigm for 3d point cloud completion Z Lyu, Z Kong, X Xu, L Pan, D Lin ICLR 2022, 2022	191	2022
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Z Kong, A Goel, R Badlani, W Ping, R Valle, B Catanzaro ICML 2024, 2024	174	2024
Speech denoising in the waveform domain with self-attention Z Kong, W Ping, A Dantrey, B Catanzaro ICASSP 2022, 7867-7871, 2022	108	2022
Fastened crown: Tightened neural network robustness certificates Z Lyu, CY Ko, Z Kong, N Wong, D Lin, L Daniel AAAI 2020, 2020	88	2020
Audio Flamingo 2: An audio-language model with long-audio understanding and expert reasoning abilities S Ghosh, Z Kong, S Kumar, S Sakshi, J Kim, W Ping, R Valle, D Manocha, ... ICML 2025, 2025	86	2025
The expressive power of a class of normalizing flow models Z Kong, K Chaudhuri AISTATS 2020, 2020	69	2020
Multi-object tracking using online metric learning with long short-term memory X Wan, J Wang, Z Kong, Q Zhao, S Deng ICIP 2018, 2018	51	2018
Audio flamingo 3: Advancing audio intelligence with fully open large audio language models A Goel, S Ghosh, J Kim, S Kumar, Z Kong, S Lee, CHH Yang, ... NeurIPS 2025, 2025	49	2025
Understanding instance-based interpretability of variational auto-encoders Z Kong, K Chaudhuri NeurIPS 2021, 2021	43	2021
Tangoflux: Super fast and faithful text to audio generation with flow matching and clap-ranked preference optimization CY Hung, N Majumder, Z Kong, A Mehrish, AA Bagherzadeh, C Li, ... arXiv preprint arXiv:2412.21037, 2024	41	2024
Can membership inferencing be refuted? Z Kong, AR Chowdhury, K Chaudhuri arXiv preprint arXiv:2303.03648, 2023	21*	2023
Improving text-to-audio models with synthetic captions Z Kong, S Lee, D Ghosal, N Majumder, A Mehrish, R Valle, S Poria, ... SynData4GenAI 2024, 2024	19	2024
Data redaction from pre-trained gans Z Kong, K Chaudhuri IEEE SaTML 2023, 2023	19	2023
Cleanunet 2: A hybrid speech denoising model on waveform and spectrogram Z Kong, W Ping, A Dantrey, B Catanzaro INTERSPEECH 2023, 2023	14	2023
Fugatto 1: Foundational Generative Audio Transformer Opus 1 R Valle, R Badlani, Z Kong, S Lee, A Goel, S Kim, JF Santos, S Dai, ... ICLR 2025, 2024	13	2024
Audio dialogues: Dialogues dataset for audio and music understanding A Goel, Z Kong, R Valle, B Catanzaro arXiv preprint arXiv:2404.07616, 2024	10	2024
Universal approximation of residual flows in maximum mean discrepancy Z Kong, K Chaudhuri ICML 2021 Workshop on Invertible Neural Networks, Normalizing Flows, and …, 2021	9	2021
ETTA: Elucidating the Design Space of Text-to-Audio Models S Lee, Z Kong, A Goel, S Kim, R Valle, B Catanzaro ICML 2025, 2024	8	2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors