[go: up one dir, main page]

Follow
Botian Shi
Botian Shi
Shanghai Artificial Intelligence Laboratory, Shanghai Innovation Institution (SII)
Verified email at pjlab.org.cn
Title
Cited by
Cited by
Year
Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling
Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui, J Zhu, S Ye, H Tian, Z Liu, ...
arXiv preprint arXiv:2412.05271, 2024
10092024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui, W Tong, K Hu, J Luo, Z Ma, ...
arXiv preprint arXiv:2404.16821, 2024
9952024
Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models
J Zhu, W Wang, Z Chen, Z Liu, S Ye, L Gu, H Tian, Y Duan, W Su, J Shao, ...
arXiv preprint arXiv:2504.10479, 2025
6702025
Univl: A unified video and language pre-training model for multimodal understanding and generation
H Luo, L Ji, B Shi, H Huang, N Duan, T Li, J Li, T Bharti, M Zhou
arXiv preprint arXiv:2002.06353, 2020
5862020
Drive like a human: Rethinking autonomous driving with large language models
D Fu, X Li, L Wen, M Dou, P Cai, B Shi, Y Qiao
2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops …, 2024
3002024
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
L Wen, D Fu, X Li, X Cai, T Ma, P Cai, M Dou, B Shi, L He, Y Qiao
The Twelfth International Conference on Learning Representations (ICLR), 2024
2952024
Multi-modal sensor fusion for auto driving perception: A survey
K Huang, B Shi, X Li, X Li, S Huang, Y Li
arXiv preprint arXiv:2202.02703, 2022
2612022
Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency
W Wang, Z Gao, L Gu, H Pu, L Cui, X Wei, Z Liu, L Jing, S Ye, J Shao, ...
arXiv preprint arXiv:2508.18265, 2025
2322025
Logonet: Towards accurate 3d object detection with local-to-global cross-modal fusion
X Li, T Ma, Y Hou, B Shi, Y Yang, Y Liu, X Wu, Q Chen, Y Li, Y Qiao, L He
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023
2162023
Multi-sensor fusion and cooperative perception for autonomous driving: A review
C Xiang, C Feng, X Xie, B Shi, H Lu, Y Lv, M Yang, Z Niu
IEEE Intelligent Transportation Systems Magazine 15 (5), 36-58, 2023
1752023
Mineru: An open-source solution for precise document content extraction
B Wang, C Xu, X Zhao, L Ouyang, F Wu, Z Zhao, R Xu, K Liu, Y Qu, ...
arXiv preprint arXiv:2409.18839, 2024
1262024
Streetsurf: Extending multi-view implicit surface reconstruction to street views
J Guo, N Deng, X Li, Y Bai, B Shi, C Wang, C Ding, D Wang, Y Li
arXiv preprint arXiv:2306.04988, 2023
1212023
Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning
R Xia, H Ye, X Yan, Q Liu, H Zhou, Z Chen, B Shi, J Yan, B Zhang
IEEE Transactions on Image Processing, 2025
1162025
On the road with gpt-4v (ision): Early explorations of visual-language model on autonomous driving
L Wen, X Yang, D Fu, X Wang, P Cai, X Li, T Ma, Y Li, L Xu, D Shang, ...
arXiv preprint arXiv:2311.05332, 2023
1072023
Is sora a world simulator? a comprehensive survey on general world models and beyond
Z Zhu, X Wang, W Zhao, C Min, B Li, N Deng, M Dou, Y Wang, B Shi, ...
arXiv preprint arXiv:2405.03520, 2024
992024
Knowledge Aware Semantic Concept Expansion for Image-Text Matching.
B Shi, L Ji, P Lu, Z Niu, N Duan
Proceedings of the Twenty-Eighth International Joint Conference on …, 2019
952019
Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
X Li, B Shi, Y Hou, X Wu, T Ma, Y Li, L He
Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel …, 2022
912022
Dense procedure captioning in narrated instructional videos
B Shi, L Ji, Y Liang, N Duan, P Chen, Z Niu, M Zhou
Proceedings of the 57th annual meeting of the association for computational …, 2019
902019
Uni3d: A unified baseline for multi-dataset 3d object detection
B Zhang, J Yuan, B Shi, T Chen, Y Li, Y Qiao
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023
752023
Microsoft concept graph: Mining semantic concepts for short text understanding
L Ji, Y Wang, B Shi, D Zhang, Z Wang, J Yan
Data Intelligence 1 (3), 238-270, 2019
732019
The system can't perform the operation now. Try again later.
Articles 1–20