| Humanity's last exam L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, CBC Zhang, M Shaaban, ... arXiv preprint arXiv:2501.14249, 2025 | 304 | 2025 |
| Nvila: Efficient frontier visual language models Z Liu, L Zhu, B Shi, Z Zhang, Y Lou, S Yang, H Xi, S Cao, Y Gu, D Li, X Li, ... Proceedings of the Computer Vision and Pattern Recognition Conference, 4122-4134, 2024 | 147* | 2024 |
| Training transformers with 4-bit integers H Xi, C Li, J Chen, J Zhu Advances in Neural Information Processing Systems 36, 49146-49168, 2023 | 86 | 2023 |
| Spargeattn: Accurate sparse attention accelerating any model inference J Zhang, C Xiang, H Huang, J Wei, H Xi, J Zhu, J Chen arXiv preprint arXiv:2502.18137, 2025 | 85* | 2025 |
| Sparse videogen: Accelerating video diffusion transformers with spatial-temporal sparsity H Xi, S Yang, Y Zhao, C Xu, M Li, X Li, Y Lin, H Cai, J Zhang, D Li, J Chen, ... arXiv preprint arXiv:2502.01776, 2025 | 80 | 2025 |
| Jetfire: Efficient and accurate transformer pretraining with int8 data flow and per-block quantization H Xi, Y Chen, K Zhao, KJ Teh, J Chen, J Zhu arXiv preprint arXiv:2403.12422, 2024 | 34 | 2024 |
| Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation S Yang*, H Xi*, Y Zhao, M Li, J Zhang, H Cai, Y Lin, X Li, C Xu, K Peng, ... arXiv preprint arXiv:2505.18875, 2025 | 21* | 2025 |
| Radial Attention: Sparse Attention with Energy Decay for Long Video Generation X Li, M Li, T Cai, H Xi, S Yang, Y Lin, L Zhang, S Yang, J Hu, K Peng, ... arXiv preprint arXiv:2506.19852, 2025 | 19* | 2025 |
| Coat: Compressing optimizer states and activation for memory-efficient fp8 training H Xi, H Cai, L Zhu, Y Lu, K Keutzer, J Chen, S Han arXiv preprint arXiv:2410.19313, 2024 | 17 | 2024 |
| Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search Y Gu, Q Hu, S Yang, H Xi, J Chen, S Han, H Cai arXiv preprint arXiv:2508.15884, 2025 | 12 | 2025 |
| Oscillation-reduced mxfp4 training for vision transformers Y Chen, H Xi, J Zhu, J Chen arXiv preprint arXiv:2502.20853, 2025 | 9 | 2025 |
| QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache R Tiwari*, H Xi*, A Tomar, C Hooper, S Kim, M Horton, M Najibi, ... arXiv preprint arXiv:2502.10424, 2025 | 7* | 2025 |
| T-rex: Text-assisted retrosynthesis prediction Y Liu, H Xu, T Fang, H Xi, Z Liu, S Zhang, H Poon, S Wang arXiv preprint arXiv:2401.14637, 2024 | 6 | 2024 |
| Dc-videogen: Efficient video generation with deep compression video autoencoder J Chen, W He, Y Gu, Y Zhao, J Yu, J Chen, D Zou, Y Lin, Z Zhang, M Li, ... arXiv preprint arXiv:2509.25182, 2025 | 2 | 2025 |
| StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation T Feng, Z Li, S Yang, H Xi, M Li, X Li, L Zhang, K Yang, K Peng, S Han, ... arXiv preprint arXiv:2511.07399, 2025 | 1 | 2025 |
| Dc-gen: Post-training diffusion acceleration with deeply compressed latent space W He, Y Gu, J Chen, D Zou, Y Lin, Z Zhang, H Xi, M Li, L Zhu, J Yu, ... arXiv preprint arXiv:2509.25180, 2025 | 1 | 2025 |
| SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention J Zhang, H Wang, K Jiang, S Yang, K Zheng, H Xi, Z Wang, H Zhu, ... arXiv preprint arXiv:2509.24006, 2025 | 1 | 2025 |
| Arbitrage: Efficient Reasoning via Advantage-Aware Speculation M Maheswaran, R Tiwari, Y Hu, K Dilmen, C Hooper, H Xi, N Lee, ... arXiv preprint arXiv:2512.05033, 2025 | | 2025 |
| XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization A Tomar, C Hooper, M Lee, H Xi, R Tiwari, W Kang, L Manolache, ... arXiv preprint arXiv:2508.10395, 2025 | | 2025 |
| Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention J Zhang, R Su, C Liu, J Wei, Z Wang, H Wang, P Zhang, H Jiang, ... | | |