[go: up one dir, main page]

Follow
Muhammad Maaz
Muhammad Maaz
PhD Computer Vision at MBZUAI
Verified email at mbzuai.ac.ae - Homepage
Title
Cited by
Cited by
Year
Maple: Multi-modal prompt learning
MU Khattak, H Rasheed, M Maaz, S Khan, FS Khan
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023
13662023
Video-chatgpt: Towards detailed video understanding via large vision and language models
M Maaz, H Rasheed, S Khan, FS Khan
Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024
13002024
UNETR++: delving into efficient and accurate 3D medical image segmentation
A Shaker, M Maaz, H Rasheed, S Khan, MH Yang, FS Khan
IEEE Transactions on Medical Imaging 43 (9), 3377-3390, 2024
4502024
Edgenext: efficiently amalgamated cnn-transformer architecture for mobile vision applications
M Maaz, A Shaker, H Cholakkal, S Khan, SW Zamir, RM Anwer, ...
European conference on computer vision, 3-20, 2022
4352022
Glamm: Pixel grounding large multimodal model
H Rasheed, M Maaz, S Shaji, A Shaker, S Khan, H Cholakkal, RM Anwer, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
4242024
Fine-tuned clip models are efficient video learners
H Rasheed, MU Khattak, M Maaz, S Khan, FS Khan
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023
2902023
Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications
A Shaker, M Maaz, H Rasheed, S Khan, MH Yang, FS Khan
Proceedings of the IEEE/CVF international conference on computer vision …, 2023
2862023
Bridging the gap between object and image-level representations for open-vocabulary detection
H Bangalath, M Maaz, MU Khattak, SH Khan, F Shahbaz Khan
Advances in Neural Information Processing Systems 35, 33781-33794, 2022
2192022
Class-agnostic object detection with multi-modal transformer
M Maaz, H Rasheed, S Khan, FS Khan, RM Anwer, MH Yang
European conference on computer vision, 512-531, 2022
162*2022
Videogpt+: Integrating image and video encoders for enhanced video understanding
M Maaz, H Rasheed, S Khan, F Khan
arXiv preprint arXiv:2406.09418, 2024
962024
Pg-video-llava: Pixel grounding large video-language models
S Munasinghe, R Thushara, M Maaz, HA Rasheed, S Khan, M Shah, ...
arXiv preprint arXiv:2311.13435, 2023
522023
Perceptionlm: Open-access data and models for detailed visual understanding
JH Cho, A Madotto, E Mavroudi, T Afouras, T Nagarajan, M Maaz, Y Song, ...
Advances in Neural Information Processing Systems (NeurIPS Spotlight), 2025
332025
Palo: A polyglot large multimodal model for 5b people
H Rasheed, M Maaz, A Shaker, S Khan, H Cholakal, RM Anwer, ...
2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV …, 2025
31*2025
Videomathqa: Benchmarking mathematical reasoning via multimodal understanding in videos
H Rasheed, A Shaker, A Tang, M Maaz, MH Yang, S Khan, FS Khan
arXiv preprint arXiv:2506.05349, 2025
52025
Self-supervised learning for fine-grained visual categorization
M Maaz, HA Rasheed, D Gaddam
arXiv preprint arXiv:2105.08788, 2021
32021
A culturally-diverse multilingual multimodal video benchmark & model
BS Shafique, A Vayani, M Maaz, HA Rasheed, D Dissanayake, ...
Proceedings of the 2025 Conference on Empirical Methods in Natural Language …, 2025
22025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
A Shaker, M Maaz, C Gou, H Rezatofighi, S Khan, FS Khan
arXiv preprint arXiv:2503.21782, 2025
12025
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
H Rasheed, M Zumri, M Maaz, MH Yang, FS Khan, S Khan
arXiv preprint arXiv:2511.23477, 2025
2025
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
M Maaz, H Rasheed, FS Khan, S Khan
arXiv preprint arXiv:2511.23478, 2025
2025
The system can't perform the operation now. Try again later.
Articles 1–19