[go: up one dir, main page]

Follow
Jaemin Cho
Jaemin Cho
Young Investigator at AI2, Incoming Assistant Professor at Johns Hopkins University
Verified email at jhu.edu - Homepage
Title
Cited by
Cited by
Year
Unifying Vision-and-Language Tasks via Text Generation
J Cho, J Lei, H Tan, M Bansal
ICML, 2021
6922021
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
YL Sung, J Cho, M Bansal
CVPR, 2022
5422022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
J Cho, A Zala, M Bansal
ICCV, 2023
463*2023
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
YL Sung, J Cho, M Bansal
NeurIPS, 2022
3462022
Self-Chained Image-Language Model for Video Localization and Question Answering
S Yu, J Cho, P Yadav, M Bansal
NeurIPS, 2023
2712023
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
J Cho, Y Hu, R Garg, P Anderson, R Krishna, J Baldridge, M Bansal, ...
ICLR, 2024
1702024
Fine-grained Image Captioning with CLIP Reward
J Cho, S Yoon, A Kale, F Dernoncourt, T Bui, M Bansal
Findings of NAACL, 2022
1352022
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
J Cho, J Lu, D Schwenk, H Hajishirzi, A Kembhavi
EMNLP, 2020
1342020
A Hierarchical Latent Structure for Variational Conversation Modeling
Y Park, J Cho, G Kim
NAACL, 2018
1342018
VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning
H Lin, A Zala, J Cho, M Bansal
COLM, 2024
1192024
Hierarchical Video-Moment Retrieval and Step-Captioning
A Zala, J Cho, S Kottur, X Chen, B Oğuz, Y Mehdad, M Bansal
CVPR, 2023
1132023
Visual Programming for Step-by-Step Text-to-Image Generation and Evaluation
J Cho, A Zala, M Bansal
NeurIPS, 2023
1132023
DOCCI: Descriptions of Connected and Contrasting Images
Y Onoe, S Rane, Z Berger, Y Bitton, J Cho, R Garg, A Ku, Z Parekh, ...
ECCV, 2024
962024
Mixture Content Selection for Diverse Sequence Generation
J Cho, M Seo, H Hajishirzi
EMNLP, 2019
792019
M3DocVQA: Multi-modal Multi-page Multi-document Understanding
J Cho, D Mahata, O Irsoy, Y He, M Bansal
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2025
66*2025
Contrastive region guidance: Improving grounding in vision-language models without training
D Wan, J Cho, E Stengel-Eskin, M Bansal
ECCV, 2024
622024
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
H Lin, J Cho, A Zala, M Bansal
ICLR, 2025
592025
TVLT: Textless Vision-Language Transformer
Z Tang, J Cho, Y Nie, M Bansal
NeurIPS, 2022
442022
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Z Tang, J Cho, H Tan, M Bansal
NeurIPS, 2021
432021
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Z Wang, A Blume, S Li, G Liu, J Cho, Z Tang, M Bansal, H Ji
NeurIPS, 2023
422023
The system can't perform the operation now. Try again later.
Articles 1–20