| Audio flamingo: A novel audio language model with few-shot learning and dialogue abilities Z Kong, A Goel, R Badlani, W Ping, R Valle, B Catanzaro arXiv preprint arXiv:2402.01831, 2024 | 174 | 2024 |
| One TTS alignment to rule them all R Badlani, A Łańcucki, KJ Shih, R Valle, W Ping, B Catanzaro ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 111 | 2022 |
| P-flow: A fast and data-efficient zero-shot TTS through speech prompting S Kim, K Shih, JF Santos, E Bakhturina, M Desta, R Valle, S Yoon, ... Advances in Neural Information Processing Systems 36, 74213-74228, 2023 | 71 | 2023 |
| RAD-TTS: Parallel flow-based TTS with robust alignment learning and diverse synthesis KJ Shih, R Valle, R Badlani, A Lancucki, W Ping, B Catanzaro ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit …, 2021 | 69 | 2021 |
| Content-based representations of audio using siamese neural networks P Manocha, R Badlani, A Kumar, A Shah, B Elizalde, B Raj 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 66 | 2018 |
| Experiments on the DCASE challenge 2016: Acoustic scene classification and sound event detection in real life recording B Elizalde, A Kumar, A Shah, R Badlani, E Vincent, B Raj, I Lane arXiv preprint arXiv:1607.06706, 2016 | 61* | 2016 |
| NELS-Never-Ending Learner of Sounds BR Benjamin Elizalde, Rohan Badlani, Ankit Shah, Anurag Kumar NIPS Workshop on Machine Learning for Audio, 2018 | 36* | 2018 |
| Disambiguating sentiment: An ensemble of humour, sarcasm, and hate speech features for sentiment classification R Badlani, N Asnani, M Rai W-NUT 2019, 337-345, 2019 | 31* | 2019 |
| An approach for self-training audio event detectors using web data B Elizalde, A Shah, S Dalmia, MH Lee, R Badlani, A Kumar, B Raj, I Lane 2017 25th European Signal Processing Conference (EUSIPCO), 1863-1867, 2017 | 29* | 2017 |
| Improving robustness of llm-based speech synthesis by learning monotonic alignment P Neekhara, S Hussain, S Ghosh, J Li, R Valle, R Badlani, B Ginsburg arXiv preprint arXiv:2406.17957, 2024 | 24 | 2024 |
| Generating and using joint representations of source code R Badlani, O Lewis, G Evangelopoulos, O Hatalsky, B Ni US Patent 11,169,786, 2021 | 20 | 2021 |
| Fugatto 1: Foundational Generative Audio Transformer Opus 1 R Valle, R Badlani, Z Kong, S Lee, A Goel, S Kim, JF Santos, S Dai, ... The Thirteenth International Conference on Learning Representations, 2025 | 13 | 2025 |
| RAD-MMM: Multilingual multiaccented multispeaker text to speech R Badlani, R Valle, KJ Shih, JF Santos, S Gururani, B Catanzaro Proc. Interspeech, 626-630, 2023 | 13 | 2023 |
| Synthesizing video from audio using one or more neural networks MY Liu, K Nagano, JRVG da Costa, J SEO, TC Wang, A Mallya, S Khamis, ... US Patent App. 17/382,027, 2023 | 8 | 2023 |
| Framework for evaluation of sound event detection in web videos R Badlani, A Shah, B Elizalde, A Kumar, B Raj 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 8 | 2018 |
| Audio flamingo: A novel audio language model with few-shot learning and dialogue abilities (2024) Z Kong, A Goel, R Badlani, W Ping, R Valle, B Catanzaro URL https://arxiv. org/abs/2402.01831, 0 | 7 | |
| Generative modeling for low dimensional speech attributes with neural spline flows KJ Shih, R Valle, R Badlani, JF Santos, B Catanzaro arXiv preprint arXiv:2203.01786, 2022 | 6 | 2022 |
| Multilingual multiaccented multispeaker TTS with RADTTS R Badlani, R Valle, KJ Shih, JF Santos, S Gururani, B Catanzaro arXiv preprint arXiv:2301.10335, 2023 | 5 | 2023 |
| Relation extraction with contextualized relation embedding (CRE) X Chen, R Badlani arXiv preprint arXiv:2011.09658, 2020 | 5 | 2020 |
| Pattern-based automatic parallelization of representative-based clustering algorithms S Islam, S Balasubramaniam, S Gupta, S Brajesh, R Badlani, ... 2018 IEEE 5th International Conference on Data Science and Advanced …, 2018 | 5 | 2018 |