Xiyang Dai

Cited by

	All	Since 2021
Citations	16816	16484
h-index	36	36
i10-index	54	54

6000

3000

1500

4500

20182019202020212022202320242025202654 111 147 558 1889 3440 4719 5626 236

Public access

View all

11 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Larry DavisProfessor of Computer Science, University of MarylandVerified email at cs.umd.edu
Bharat SinghAdobe FireflyVerified email at adobe.com
Bogdan MateiTechnical Director, SRI InternationalVerified email at sri.com
Joe Yue-Hei NgGoogle Research
Xinchao WangNational University of SingaporeVerified email at nus.edu.sg
Lu YuanResearch Scientist Manager, GenAI, Meta

Xiyang Dai

Microsoft

Verified email at microsoft.com - Homepage

Computer Vision Deep Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Cvt: Introducing convolutions to vision transformers H Wu, B Xiao, N Codella, M Liu, X Dai, L Yuan, L Zhang Proceedings of the IEEE/CVF international conference on computer vision, 22-31, 2021	3134	2021
Dynamic convolution: Attention over convolution kernels. Y Chen, X Dai, M Liu, D Chen, L Yuan, Z Liu CVF Conference on Computer Vision and Pattern Recognition, CVPR, 13-19, 2020	1677	2020
Florence: A new foundation model for computer vision L Yuan, D Chen, YL Chen, N Codella, X Dai, J Gao, H Hu, X Huang, B Li, ... arXiv preprint arXiv:2111.11432, 2021	1223	2021
Dynamic head: Unifying object detection heads with attentions X Dai, Y Chen, B Xiao, D Chen, M Liu, L Yuan, L Zhang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021	1193	2021
Regionclip: Region-based language-image pretraining Y Zhong, J Yang, P Zhang, C Li, N Codella, LH Li, L Zhou, X Dai, L Yuan, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	900	2022
Mobile-former: Bridging mobilenet and transformer Y Chen, X Dai, D Chen, M Liu, X Dong, L Yuan, Z Liu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	877	2022
Focal Self-attention for Local-Global Interactions in Vision Transformers J Yang, C Li, P Zhang, X Dai, B Xiao, L Yuan, J Gao Advances in Neural Information Processing Systems, 2021, 2021	842*	2021
Rewrite the stars X Ma, X Dai, Y Bai, Y Wang, Y Fu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2024	542	2024
Dynamic detr: End-to-end object detection with dynamic attention X Dai, Y Chen, J Yang, P Zhang, L Yuan, L Zhang Proceedings of the IEEE/CVF international conference on computer vision …, 2021	538	2021
Focal modulation networks J Yang, C Li, X Dai, J Gao Advances in Neural Information Processing Systems 35, 4203-4217, 2022	518	2022
Multi-scale vision longformer: A new vision transformer for high-resolution image encoding P Zhang, X Dai, J Yang, B Xiao, L Yuan, L Zhang, J Gao Proceedings of the IEEE/CVF international conference on computer vision …, 2021	487	2021
Florence-2: Advancing a unified representation for a variety of vision tasks B Xiao, H Wu, W Xu, X Dai, H Hu, Y Lu, M Zeng, C Liu, L Yuan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024	440	2024
Glipv2: Unifying localization and vision-language understanding H Zhang, P Zhang, X Hu, YC Chen, L Li, X Dai, L Wang, L Yuan, ... Advances in Neural Information Processing Systems 35, 36067-36080, 2022	401	2022
Man: Moment alignment network for natural language moment retrieval via iterative graph adjustment D Zhang, X Dai, X Wang, YF Wang, LS Davis Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019	398	2019
Generalized decoding for pixel, image, and language X Zou, ZY Dou, J Yang, Z Gan, L Li, C Li, X Dai, H Behl, J Wang, L Yuan, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023	379	2023
Bevt: Bert pretraining of video transformers R Wang, D Chen, Z Wu, Y Chen, X Dai, M Liu, YG Jiang, L Zhou, L Yuan Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	330	2022
Temporal context network for activity localization in videos X Dai, B Singh, G Zhang, LS Davis, Y Qiu Chen Proceedings of the IEEE International Conference on Computer Vision, 5793-5802, 2017	328	2017
Dynamic ReLU Y Chen, X Dai, M Liu, D Chen, L Yuan, Z Liu European Conference on Computer Vision, 351-367, 2020	304	2020
Efficient self-supervised vision transformers for representation learning C Li, J Yang, P Zhang, M Gao, B Xiao, X Dai, L Yuan, J Gao arXiv preprint arXiv:2106.09785, 2021	272	2021
Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras A Abouelenin, A Ashfaq, A Atkinson, H Awadalla, N Bach, J Bao, ... arXiv preprint arXiv:2503.01743, 2025	205	2025

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors