| Omniquant: Omnidirectionally calibrated quantization for large language models W Shao*, M Chen*, Z Zhang, P Xu, L Zhao, Z Li, K Zhang, P Gao, Y Qiao, ... ICLR2024 spotlight (* equal contribution), 2023 | 465 | 2023 |
| DanceGRPO: Unleashing GRPO on Visual Generation Z Xue, J Wu, Y Gao, F Kong, L Zhu, M Chen, Z Liu, W Liu, Q Guo, ... arXiv preprint arXiv:2505.07818, 2025 | 135* | 2025 |
| Efficientqat: Efficient quantization-aware training for large language models M Chen, W Shao, P Xu, J Wang, P Gao, K Zhang, P Luo ACL 2025 Main, 2025 | 123 | 2025 |
| Cf-vit: A general coarse-to-fine method for vision transformer M Chen, M Lin, K Li, Y Shen, Y Wu, F Chao, R Ji AAAI 2023 Oral, 2023 | 112 | 2023 |
| Diffrate: Differentiable compression rate for efficient vision transformers M Chen, W Shao, P Xu, M Lin, K Zhang, F Chao, R Ji, Y Qiao, P Luo ICCV 2023, 2023 | 100 | 2023 |
| Besa: Pruning large language models with blockwise parameter-efficient sparsity allocation P Xu, W Shao, M Chen, S Tang, K Zhang, P Gao, F An, Y Qiao, P Luo ICLR 2024, 2024 | 51 | 2024 |
| Super vision transformer M Lin*, M Chen*, Y Zhang, C Shen, R Ji, L Cao IJCV 2023 (* equal contribution), 2023 | 43 | 2023 |
| Prefixquant: Eliminating outliers by prefixed tokens for large language models quantization M Chen, Y Liu, J Wang, Y Bin, W Shao, P Luo arXiv preprint arXiv:2410.05265, 2024 | 23* | 2024 |
| Fine-grained data distribution alignment for post-training quantization Y Zhong, M Lin, M Chen, K Li, Y Shen, F Chao, Y Wu, R Ji ECCV 2022, 2022 | 22 | 2022 |
| I&S-ViT: An Inclusive & Stable Method for Post-Training ViTs Quantization Y Zhong, J Hu, M Lin, M Chen, R Ji IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025 | 18* | 2025 |
| Smmix: Self-motivated image mixing for vision transformers M Chen, M Lin, Z Lin, Y Zhang, F Chao, R Ji ICCV 2023, 2023 | 17 | 2023 |
| Model Merging in Pre-training of Large Language Models Y Li, Y Ma, S Yan, C Zhang, J Liu, J Lu, Z Xu, M Chen, M Wang, S Zhan, ... arXiv preprint arXiv:2505.12082, 2025 | 15 | 2025 |
| OptG: Optimizing Gradient-driven Criteria in Network Sparsity Y Zhang, M Lin, M Chen, F Chao, R Ji arXiv preprint arXiv:2201.12826, 2022 | 9 | 2022 |
| Scaling Law for Quantization-Aware Training M Chen, C Zhang, J Liu, Y Zeng, Z Xue, Z Liu, Y Li, J Ma, J Huang, X Zhou, ... arXiv preprint arXiv:2505.14302, 2025 | 8 | 2025 |
| Worldweaver: Generating long-horizon video worlds via rich perception Z Liu, X Deng, S Chen, A Wang, Q Guo, M Han, Z Xue, M Chen, P Luo, ... arXiv preprint arXiv:2508.15720, 2025 | 4 | 2025 |
| Enhance-A-Video: Better Generated Video for Free Y Luo, X Zhao, M Chen, K Zhang, W Shao, K Wang, Z Wang, Y You arXiv preprint arXiv:2502.07508, 2025 | 4 | 2025 |
| LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation J Wang, N Kang, L Yao, M Chen, C Wu, S Zhang, S Xue, Y Liu, T Wu, ... arXiv preprint arXiv:2501.12976, 2025 | 3 | 2025 |
| Adapting llama decoder to vision transformer J Wang, W Shao, M Chen, C Wu, Y Liu, T Wu, K Zhang, S Zhang, K Chen, ... arXiv preprint arXiv:2404.06773, 2024 | 3 | 2024 |
| Parallel Loop Transformer for Efficient Test-Time Computation Scaling B Wu, M Chen, X Luo, S Yan, Q Yu, F Xia, T Zhang, H Zhan, Z Zhong, ... arXiv preprint arXiv:2510.24824, 2025 | 2 | 2025 |
| INT vs FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats M Chen, M Wu, H Jin, Z Yuan, J Liu, C Zhang, Y Li, J Huang, J Ma, Z Xue, ... arXiv preprint arXiv:2510.25602, 2025 | | 2025 |