| Detrs with collaborative hybrid assignments training Z Zong, G Song, Y Liu Proceedings of the IEEE/CVF international conference on computer vision …, 2023 | 721 | 2023 |
| Uniformer: Unifying convolution and self-attention for visual recognition K Li, Y Wang, J Zhang, P Gao, G Song, Y Liu, H Li, Y Qiao IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (10 …, 2023 | 655 | 2023 |
| Revisiting the sibling head in object detector G Song, Y Liu, X Wang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 579 | 2020 |
| Uniformer: Unified transformer for efficient spatiotemporal representation learning K Li, Y Wang, P Gao, G Song, Y Liu, H Li, Y Qiao ICLR2022, 2022 | 478 | 2022 |
| Lmdrive: Closed-loop end-to-end driving with large language models H Shao, Y Hu, L Wang, G Song, SL Waslander, Y Liu, H Li Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 328 | 2024 |
| Visual cot: Advancing multi-modal language models with a comprehensive dataset and benchmark for chain-of-thought reasoning H Shao, S Qian, H Xiao, G Song, Z Zong, L Wang, Y Liu, H Li Advances in Neural Information Processing Systems 37, 8612-8642, 2024 | 223 | 2024 |
| Raphael: Text-to-image generation via large mixture of diffusion paths Z Xue, G Song, Q Guo, B Liu, Z Zong, Y Liu, P Luo Advances in Neural Information Processing Systems 36, 41693-41706, 2023 | 215 | 2023 |
| Region-based quality estimation network for large-scale person re-identification G Song, B Leng, Y Liu, C Hetang, S Cai Proceedings of the AAAI conference on artificial intelligence 32 (1), 2018 | 206 | 2018 |
| Gen-l-video: Multi-text to long video generation via temporal co-denoising FY Wang, W Chen, G Song, HJ Ye, Y Liu, H Li arXiv preprint arXiv:2305.18264, 2023 | 136 | 2023 |
| Mova: Adapting mixture of vision experts to multimodal context Z Zong, B Ma, D Shen, G Song, H Shao, D Jiang, H Li, Y Liu Advances in Neural Information Processing Systems 37, 103305-103333, 2024 | 96 | 2024 |
| Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models H Shao, S Qian, H Xiao, G Song, Z Zong, L Wang, Y Liu, H Li CoRR, 2024 | 86 | 2024 |
| Phased consistency models FY Wang, Z Huang, A Bergman, D Shen, P Gao, M Lingelbach, K Sun, ... Advances in neural information processing systems 37, 83951-84009, 2024 | 76 | 2024 |
| Comat: Aligning text-to-image diffusion model with image-to-text concept matching D Jiang, G Song, X Wu, R Zhang, D Shen, Z Zong, Y Liu, H Li Advances in Neural Information Processing Systems 37, 76177-76209, 2024 | 55 | 2024 |
| Mmsearch: Benchmarking the potential of large models as multi-modal search engines D Jiang, R Zhang, Z Guo, Y Wu, J Lei, P Qiu, P Lu, Z Chen, C Fu, G Song, ... arXiv preprint arXiv:2409.12959, 2024 | 55 | 2024 |
| Self-slimmed Vision Transformer Z Zong, K Li, G Song, Y Wang, Y Qiao, B Leng, Y Liu ECCV2022, 2021 | 53 | 2021 |
| Animatelcm: Accelerating the animation of personalized diffusion models and adapters with decoupled consistency learning FY Wang, Z Huang, X Shi, W Bian, G Song, Y Liu, H Li CoRR, 2024 | 52 | 2024 |
| Temporal enhanced training of multi-view 3d object detector via historical object prediction Z Zong, D Jiang, G Song, Z Xue, J Su, H Li, Y Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 49 | 2023 |
| Uninet: Unified architecture search with convolution, transformer, and mlp J Liu, X Huang, G Song, Y Liu, H Li ECCV2022, 2022 | 49 | 2022 |
| Intern: A new learning paradigm towards general vision J Shao, S Chen, Y Li, K Wang, Z Yin, Y He, J Teng, Q Sun, M Gao, J Liu, ... arXiv preprint arXiv:2111.08687, 2021 | 42 | 2021 |
| Transductive centroid projection for semi-supervised large-scale recognition Y Liu, G Song, J Shao, X Jin, X Wang Proceedings of the European conference on computer vision (ECCV), 70-86, 2018 | 42 | 2018 |