[go: up one dir, main page]

Follow
Jiasen Lu
Jiasen Lu
Research Scientist, Apple
Verified email at apple.com - Homepage
Title
Cited by
Cited by
Year
Vqa: Visual question answering
A Agrawal*, J Lu*, S Antol*, M Mitchell, CL Zitnick, D Parikh, D Batra
International Journal of Computer Vision 123 (1), 4-31, 2017
8055*2017
Vqa: Visual question answering
S Antol, A Agrawal, J Lu, M Mitchell, D Batra, C Lawrence Zitnick, ...
Proceedings of the IEEE International Conference on Computer Vision, 2425-2433, 2015
80322015
Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks
J Lu, D Batra, D Parikh, S Lee
Advances in neural information processing systems, 2019
52232019
Hierarchical question-image co-attention for visual question answering
J Lu, J Yang, D Batra, D Parikh
Advances in neural information processing systems 29, 2016
22492016
Knowing when to look: Adaptive attention via a visual sentinel for image captioning
J Lu*, C Xiong*, D Parikh, R Socher
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2017
21252017
Graph R-CNN for Scene Graph Generation
J Yang*, J Lu*, S Lee, D Batra, D Parikh
arXiv preprint arXiv:1808.00191, 2018
11602018
Neural Baby Talk
J Lu*, J Yang*, D Batra, D Parikh
In Proceedings of the IEEE conference on computer vision and pattern …, 2018
6372018
12-in-1: Multi-Task Vision and Language Representation Learning
J Lu*, V Goswami*, M Rohrbach, D Parikh, S Lee
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2019
6102019
Unified-IO: A unified model for vision, language, and multi-modal tasks
J Lu, C Clark, R Zellers, R Mottaghi, A Kembhavi
arXiv preprint arXiv:2206.08916, 2022
5732022
Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models
M Deitke, C Clark, S Lee, R Tripathi, Y Yang, JS Park, M Salehi, ...
arXiv e-prints, arXiv: 2409.17146, 2024
509*2024
Parlai: A dialog research software platform
A Miller, W Feng, D Batra, A Bordes, A Fisch, J Lu, D Parikh, J Weston
Proceedings of the 2017 conference on empirical methods in natural language …, 2017
4832017
Self-monitoring navigation agent via auxiliary progress estimation
CY Ma, J Lu, Z Wu, G AlRegib, Z Kira, R Socher, C Xiong
arXiv preprint arXiv:1901.03035, 2019
3532019
Merlot reserve: Neural script knowledge through vision and language and sound
R Zellers, J Lu, X Lu, Y Yu, Y Zhao, M Salehi, A Kusupati, J Hessel, ...
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022
3322022
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
J Lu, C Clark, S Lee, Z Zhang, S Khosla, R Marten, D Hoiem, A Kembhavi
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
2902024
Multi-modal answer validation for knowledge-based vqa
J Wu, J Lu, A Sabharwal, R Mottaghi
Proceedings of the AAAI conference on artificial intelligence 36 (3), 2712-2721, 2022
1992022
Sentinel gate for modulating auxiliary information in a long short-term memory (lstm) neural network
LU Jiasen, C Xiong, R Socher
US Patent 10,565,306, 2020
1622020
Best of both worlds: Transferring knowledge from discriminative learning to a generative visual dialog model
J Lu, A Kannan, J Yang, D Parikh, D Batra
Advances in Neural Information Processing Systems 30, 2017
1592017
Container: Context aggregation network
P Gao, J Lu, H Li, R Mottaghi, A Kembhavi
arXiv preprint arXiv:2106.01401, 2021
145*2021
X-lxmert: Paint, caption and answer questions with multi-modal transformers
J Cho, J Lu, D Schwenk, H Hajishirzi, A Kembhavi
arXiv preprint arXiv:2009.11278, 2020
1352020
Adaptive attention model for image captioning
LU Jiasen, C Xiong, R Socher
US Patent 10,565,305, 2020
1262020
The system can't perform the operation now. Try again later.
Articles 1–20