Jaemin Cho

Cited by

	All	Since 2021
Citations	3948	3859
h-index	24	24
i10-index	28	28

1700

850

425

1275

2019202020212022202320242025202630 49 94 235 696 1184 1601 46

Public access

View all

14 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Mohit BansalParker Distinguished Professor, UNC Chapel Hill (PECASE/ACL/AAAI Fellow)Verified email at cs.unc.edu
Abhay ZalaHeyGenVerified email at heygen.com
Han LinCS PhD Student, UNCVerified email at cs.unc.edu
Yi-Lin SungUNC Chapel HillVerified email at cs.unc.edu
Hao TanAdobe ResearchVerified email at adobe.com
Jie Lei 雷杰Research Scientist, FAIR at MetaVerified email at fb.com
Elias Stengel-EskinAssistant Professor, University of Texas at AustinVerified email at cs.unc.edu
Zineng TangUC BerkeleyVerified email at cs.unc.edu
Jaehong YoonAssistant Professor, NTU SingaporeVerified email at ntu.edu.sg
Shoubin YuPhD Candidate at UNC Chapel HillVerified email at cs.unc.edu
Jordi Pont-TusetResearch Scientist at Google DeepmindVerified email at google.com
Jason BaldridgeResearch Scientist, GoogleVerified email at google.com
Su WangBAMVerified email at bamfunds.com
Roopal GargSr. Staff Software Engineer @ Google ResearchVerified email at google.com
Hannaneh HajishirziUniversity of Washington; Allen AIVerified email at cs.washington.edu
David Seunghyun YoonResearch Scientist, Adobe ResearchVerified email at adobe.com
Trung H. BuiSenior Research Scientist & Research Manager, Adobe ResearchVerified email at adobe.com
Ajinkya KaleAdobeVerified email at adobe.com
Prateek YadavPhD, University of North Carolina Chapel HillVerified email at cs.unc.edu
Peter AndersonHead of Research, Applied AI at Balyasny Asset ManagementVerified email at bamfunds.com

Jaemin Cho

Young Investigator at AI2, Incoming Assistant Professor at Johns Hopkins University

Verified email at jhu.edu - Homepage

Multimodal Learning Natural Language Processing Machine Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Unifying Vision-and-Language Tasks via Text Generation J Cho, J Lei, H Tan, M Bansal ICML, 2021	692	2021
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks YL Sung, J Cho, M Bansal CVPR, 2022	542	2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models J Cho, A Zala, M Bansal ICCV, 2023	463*	2023
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning YL Sung, J Cho, M Bansal NeurIPS, 2022	346	2022
Self-Chained Image-Language Model for Video Localization and Question Answering S Yu, J Cho, P Yadav, M Bansal NeurIPS, 2023	271	2023
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation J Cho, Y Hu, R Garg, P Anderson, R Krishna, J Baldridge, M Bansal, ... ICLR, 2024	170	2024
Fine-grained Image Captioning with CLIP Reward J Cho, S Yoon, A Kale, F Dernoncourt, T Bui, M Bansal Findings of NAACL, 2022	135	2022
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers J Cho, J Lu, D Schwenk, H Hajishirzi, A Kembhavi EMNLP, 2020	134	2020
A Hierarchical Latent Structure for Variational Conversation Modeling Y Park, J Cho, G Kim NAACL, 2018	134	2018
VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning H Lin, A Zala, J Cho, M Bansal COLM, 2024	119	2024
Hierarchical Video-Moment Retrieval and Step-Captioning A Zala, J Cho, S Kottur, X Chen, B Oğuz, Y Mehdad, M Bansal CVPR, 2023	113	2023
Visual Programming for Step-by-Step Text-to-Image Generation and Evaluation J Cho, A Zala, M Bansal NeurIPS, 2023	113	2023
DOCCI: Descriptions of Connected and Contrasting Images Y Onoe, S Rane, Z Berger, Y Bitton, J Cho, R Garg, A Ku, Z Parekh, ... ECCV, 2024	96	2024
Mixture Content Selection for Diverse Sequence Generation J Cho, M Seo, H Hajishirzi EMNLP, 2019	79	2019
M3DocVQA: Multi-modal Multi-page Multi-document Understanding J Cho, D Mahata, O Irsoy, Y He, M Bansal Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2025	66*	2025
Contrastive region guidance: Improving grounding in vision-language models without training D Wan, J Cho, E Stengel-Eskin, M Bansal ECCV, 2024	62	2024
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model H Lin, J Cho, A Zala, M Bansal ICLR, 2025	59	2025
TVLT: Textless Vision-Language Transformer Z Tang, J Cho, Y Nie, M Bansal NeurIPS, 2022	44	2022
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer Z Tang, J Cho, H Tan, M Bansal NeurIPS, 2021	43	2021
Paxion: Patching Action Knowledge in Video-Language Foundation Models Z Wang, A Blume, S Li, G Liu, J Cho, Z Tang, M Bansal, H Ji NeurIPS, 2023	42	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors