| Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning D Guo, D Yang, H Zhang, J Song, R Zhang, R Xu, Q Zhu, S Ma, P Wang, ... arXiv preprint arXiv:2501.12948, 2025 | 6183 | 2025 |
| Deepseek-v3 technical report A Liu, B Feng, B Xue, B Wang, B Wu, C Lu, C Zhao, C Deng, C Zhang, ... arXiv preprint arXiv:2412.19437, 2024 | 3190 | 2024 |
| Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models D Dai, C Deng, C Zhao, RX Xu, H Gao, D Chen, J Li, W Zeng, X Yu, Y Wu, ... arXiv preprint arXiv:2401.06066, 2024 | 712 | 2024 |
| Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model A Liu, B Feng, B Wang, B Wang, B Liu, C Zhao, C Dengr, C Ruan, D Dai, ... arXiv preprint arXiv:2405.04434, 2024 | 612 | 2024 |
| Deepseek-coder-v2: Breaking the barrier of closed-source models in code intelligence Q Zhu, D Guo, Z Shao, D Yang, P Wang, R Xu, Y Wu, Y Li, H Gao, S Ma, ... arXiv preprint arXiv:2406.11931, 2024 | 361 | 2024 |
| DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning D Guo, D Yang, H Zhang, J Song, P Wang, Q Zhu, R Xu, R Zhang, S Ma, ... Nature 645 (8081), 633-638, 2025 | 356 | 2025 |
| Native sparse attention: Hardware-aligned and natively trainable sparse attention J Yuan, H Gao, D Dai, J Luo, L Zhao, Z Zhang, Z Xie, Y Wei, L Wang, ... Proceedings of the 63rd Annual Meeting of the Association for Computational …, 2025 | 196 | 2025 |
| Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model AL DeepSeek-AI, B Feng, B Wang, B Wang, B Liu, C Zhao, C Dengr, ... arXiv preprint arXiv:2405.04434, 2024 | 72 | 2024 |
| Deepseek-v3 technical report, 2024 AL DeepSeek-AI, B Feng, B Xue, B Wang, B Wu, C Lu, C Zhao, C Deng, ... URL https://arxiv. org/abs/2412.19437, 2024 | 67 | 2024 |
| Piap-df: Pixel-interested and anti person-specific facial action unit detection net with discrete feedback learning Y Tang, W Zeng, D Zhao, H Zhang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 39 | 2021 |
| Deepseek-v3. 2: Pushing the frontier of open large language models A Liu, A Mei, B Lin, B Xue, B Wang, B Xu, B Wu, B Zhang, C Lin, C Dong, ... arXiv preprint arXiv:2512.02556, 2025 | 14 | 2025 |
| DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models. CoRR, 2024 D Dai, C Deng, C Zhao, RX Xu, H Gao, D Chen, J Li, W Zeng, X Yu, Y Wu, ... | 4 | |
| mHC: Manifold-Constrained Hyper-Connections Z Xie, Y Wei, H Cao, C Zhao, C Deng, J Li, D Dai, H Gao, J Chang, L Zhao, ... arXiv preprint arXiv:2512.24880, 2025 | 3 | 2025 |
| Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models X Cheng, W Zeng, D Dai, Q Chen, B Wang, Z Xie, K Huang, X Yu, Z Hao, ... arXiv preprint arXiv:2601.07372, 2026 | | 2026 |