[go: up one dir, main page]

Follow
Haocheng Xi
Haocheng Xi
Verified email at berkeley.edu - Homepage
Title
Cited by
Cited by
Year
Humanity's last exam
L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, CBC Zhang, M Shaaban, ...
arXiv preprint arXiv:2501.14249, 2025
3042025
Nvila: Efficient frontier visual language models
Z Liu, L Zhu, B Shi, Z Zhang, Y Lou, S Yang, H Xi, S Cao, Y Gu, D Li, X Li, ...
Proceedings of the Computer Vision and Pattern Recognition Conference, 4122-4134, 2024
147*2024
Training transformers with 4-bit integers
H Xi, C Li, J Chen, J Zhu
Advances in Neural Information Processing Systems 36, 49146-49168, 2023
862023
Spargeattn: Accurate sparse attention accelerating any model inference
J Zhang, C Xiang, H Huang, J Wei, H Xi, J Zhu, J Chen
arXiv preprint arXiv:2502.18137, 2025
85*2025
Sparse videogen: Accelerating video diffusion transformers with spatial-temporal sparsity
H Xi, S Yang, Y Zhao, C Xu, M Li, X Li, Y Lin, H Cai, J Zhang, D Li, J Chen, ...
arXiv preprint arXiv:2502.01776, 2025
802025
Jetfire: Efficient and accurate transformer pretraining with int8 data flow and per-block quantization
H Xi, Y Chen, K Zhao, KJ Teh, J Chen, J Zhu
arXiv preprint arXiv:2403.12422, 2024
342024
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
S Yang*, H Xi*, Y Zhao, M Li, J Zhang, H Cai, Y Lin, X Li, C Xu, K Peng, ...
arXiv preprint arXiv:2505.18875, 2025
21*2025
Radial Attention: Sparse Attention with Energy Decay for Long Video Generation
X Li, M Li, T Cai, H Xi, S Yang, Y Lin, L Zhang, S Yang, J Hu, K Peng, ...
arXiv preprint arXiv:2506.19852, 2025
19*2025
Coat: Compressing optimizer states and activation for memory-efficient fp8 training
H Xi, H Cai, L Zhu, Y Lu, K Keutzer, J Chen, S Han
arXiv preprint arXiv:2410.19313, 2024
172024
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Y Gu, Q Hu, S Yang, H Xi, J Chen, S Han, H Cai
arXiv preprint arXiv:2508.15884, 2025
122025
Oscillation-reduced mxfp4 training for vision transformers
Y Chen, H Xi, J Zhu, J Chen
arXiv preprint arXiv:2502.20853, 2025
92025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
R Tiwari*, H Xi*, A Tomar, C Hooper, S Kim, M Horton, M Najibi, ...
arXiv preprint arXiv:2502.10424, 2025
7*2025
T-rex: Text-assisted retrosynthesis prediction
Y Liu, H Xu, T Fang, H Xi, Z Liu, S Zhang, H Poon, S Wang
arXiv preprint arXiv:2401.14637, 2024
62024
Dc-videogen: Efficient video generation with deep compression video autoencoder
J Chen, W He, Y Gu, Y Zhao, J Yu, J Chen, D Zou, Y Lin, Z Zhang, M Li, ...
arXiv preprint arXiv:2509.25182, 2025
22025
StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation
T Feng, Z Li, S Yang, H Xi, M Li, X Li, L Zhang, K Yang, K Peng, S Han, ...
arXiv preprint arXiv:2511.07399, 2025
12025
Dc-gen: Post-training diffusion acceleration with deeply compressed latent space
W He, Y Gu, J Chen, D Zou, Y Lin, Z Zhang, H Xi, M Li, L Zhu, J Yu, ...
arXiv preprint arXiv:2509.25180, 2025
12025
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
J Zhang, H Wang, K Jiang, S Yang, K Zheng, H Xi, Z Wang, H Zhu, ...
arXiv preprint arXiv:2509.24006, 2025
12025
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
M Maheswaran, R Tiwari, Y Hu, K Dilmen, C Hooper, H Xi, N Lee, ...
arXiv preprint arXiv:2512.05033, 2025
2025
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization
A Tomar, C Hooper, M Lee, H Xi, R Tiwari, W Kang, L Manolache, ...
arXiv preprint arXiv:2508.10395, 2025
2025
Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
J Zhang, R Su, C Liu, J Wei, Z Wang, H Wang, P Zhang, H Jiang, ...
The system can't perform the operation now. Try again later.
Articles 1–20