| Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization J Abdul Samadh, MH Gani, N Hussein, MU Khattak, MM Naseer, ... Advances in Neural Information Processing Systems 36, 80396-80413, 2023 | 137 | 2023 |
| How to Train Vision Transformer on Small-scale Datasets? H Gani, M Naseer, M Yaqub 33rd British Machine Vision Conference (BMVC) 2022, 2022 | 90 | 2022 |
| LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts H Gani, SF Bhat, M Naseer, S Khan, P Wonka Twelfth International Conference on Learning Representations (ICLR) 2024, 2024 | 60 | 2024 |
| VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos S Munasinghe, H Gani, W Zhu, J Cao, E Xing, FS Khan, S Khan IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025, 2025 | 27 | 2025 |
| Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models R Imam, H Gani, M Huzaifa, K Nandakumar IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025, 2025 | 20 | 2025 |
| A Supervised Learning Methodology for Real-Time Disguised Face Recognition in the Wild S Kumaar, A Dogra, A Majeedi, H Gani, RM Vishwanath, SN Omkar arXiv preprint arXiv:1809.02875, 2018 | 17* | 2018 |
| Vane-bench: Video anomaly evaluation benchmark for conversational lmms H Gani, R Bharadwaj, M Naseer, FS Khan, S Khan Findings of the Association for Computational Linguistics: NAACL 2025, 3123-3140, 2025 | 13* | 2025 |
| AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment U Nawaz, M Awais, H Gani, M Naseer, F Khan, S Khan, RM Anwer International Conference on Computational Linguistics (COLING), 2025 | 13 | 2025 |
| Aurelia: Test-time reasoning distillation in audio-visual llms S Chowdhury, H Gani, N Anand, S Nag, R Gao, M Elhoseiny, S Khan, ... IEEE/CVF International Conference on Computer Vision (ICCV) 2025, 2025 | 6 | 2025 |
| Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks T Ashraf, A Saqib, H Ghani, M AlMahri, Y Li, N Ahsan, U Nawaz, J Lahoud, ... arXiv preprint arXiv:2505.24876, 2025 | 4 | 2025 |
| VideoMolmo: Spatio-Temporal Grounding Meets Pointing GS Ahmad, A Heakl, H Gani, A Shaker, Z Shen, FS Khan, S Khan arXiv preprint arXiv:2506.05336, 2025 | 2 | 2025 |
| Multi-Attribute Vision Transformers are Efficient and Robust Learners H Gani, N Saadi, N Hussein, K Nandakumar IEEE International Conference on Image Processing (ICIP), 2024 | 2 | 2024 |
| MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation H Gani, M Naseer, F Khan, S Khan Medical Image Computing and Computer Assisted Intervention (MICCAI), 2024 | 1 | 2024 |
| System and method of training vision transformer on small-scale datasets MH GANI, MM NASEER, M YAQUB US Patent 12,417,620, 2025 | | 2025 |
| VideoMolmo: Spatio-Temporal Grounding Meets Pointing G Shazan Ahmad, A Heakl, H Gani, A Shaker, Z Shen, R Krishna, ... arXiv e-prints, arXiv: 2506.05336, 2025 | | 2025 |
| Text-to-Image Diffusion with Complex and Detailed Prompts H Gani MS Thesis, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), 2024 | | 2024 |
| Analyzing the Robustness and the Reliability of Large Language Models H Gani, R Bharadwaj, M Huzaifa | | |