| E-mapp: Efficient multi-agent reinforcement learning with parallel program guidance C Chang, N Mu, J Wu, L Pan, H Xu (NeurIPS 2022) Advances in Neural Information Processing Systems 35, 12154-12168, 2022 | 9 | 2022 |
| Large-scale data center cooling control via sample-efficient reinforcement learning N Mu, X Hu, QS Jia, X Zhu, X He 2024 IEEE 20th International Conference on Automation Science and …, 2024 | 7 | 2024 |
| S-EPOA: Overcoming the Indistinguishability of Segments with Skill-Driven Preference-Based Reinforcement Learning N Mu*, Y Luan*, Y Yang, Q Jia (IJCAI 2025) arXiv preprint arXiv:2408.12130, 2024 | 6* | 2024 |
| Integrating mechanism and data: Reinforcement learning based on multi-fidelity model for data center cooling control N Mu, X Hu, QS Jia 2023 China Automation Congress (CAC), 5283-5288, 2023 | 5 | 2023 |
| Preference-based Multi-Objective Reinforcement Learning N Mu*, Y Luan*, QS Jia (IEEE TASE) IEEE Transactions on Automation Science and Engineering, 2025 | 4 | 2025 |
| CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries N Mu*, H Hu*, X Hu, Y Yang, B Xu, QS Jia (ICML 2025) arXiv preprint arXiv:2506.00388, 2025 | 4 | 2025 |
| A Transformer-Based Thermal Surrogate Model for Cooling Control in Data Centers H Zhou, N Mu, QS Jia IEEE Robotics and Automation Letters, 2024 | 3 | 2024 |
| Preference-Based Multi-Objective Reinforcement Learning with Explicit Reward Modeling N Mu, Y Luan, QS Jia 2024 China Automation Congress (CAC), 4874-4879, 2024 | 3 | 2024 |
| MrCoM: A Meta-Regularized World-Model Generalizing Across Multi-Scenarios X Xiong, N Mu, R Xie, S Yang, Y Wang, L Wang, Y Luan, S Li, S Xu, ... (AAAI 2026) arXiv preprint arXiv:2511.06252, 2025 | 2 | 2025 |
| SC2Arena and StarEvolve: Benchmark and Self-Improvement Framework for LLMs in Complex Decision-Making Tasks P Shen*, Y Wang*, N Mu*, Y Luan, R Xie, S Yang, L Wang, H Hu, S Xu, ... arXiv preprint arXiv:2508.10428, 2025 | 1 | 2025 |
| DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning R Xie*, Q Wang*, H Hu, Z Zhou, N Mu, X Li, Y Yang, S Xu, Q Zhao, B XU (NeurIPS 2025) arXiv preprint arXiv:2510.19562, 2025 | | 2025 |
| STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning Y Luan*, N Mu*, Y Yang, B Xu, QS Jia (NeurIPS 2025) arXiv preprint arXiv:2509.23802, 2025 | | 2025 |
| Safety-Guaranteed Policy Composition via Generalized Policy Improvement for Autonomous Vehicles N Mu, Y Luan, QS Jia (Outstanding WiRA Student Paper) 2025 IEEE 21st International Conference on …, 2025 | | 2025 |
| Addressing Coupling in Restless Multi-Armed Bandits by Finetuning Whittle Index Y Luan, N Mu, QS Jia 2025 IEEE 21st International Conference on Automation Science and …, 2025 | | 2025 |