| Transferring fairness under distribution shifts via fair consistency regularization B An, Z Che, M Ding, F Huang Advances in Neural Information Processing Systems 35, 32582-32597, 2022 | 53 | 2022 |
| Can watermarking large language models prevent copyrighted text generation and hide training data? MA Panaitescu-Liess, Z Che, B An, Y Xu, P Pathmanathan, S Chakraborty, ... Proceedings of the AAAI Conference on Artificial Intelligence 39 (23), 25002 …, 2025 | 23* | 2025 |
| Sail: Self-improving efficient online alignment of large language models M Ding, S Chakraborty, V Agrawal, Z Che, A Koppel, M Wang, A Bedi, ... arXiv preprint arXiv:2406.15567, 2024 | 23 | 2024 |
| Model tampering attacks enable more rigorous evaluations of llm capabilities Z Che, S Casper, R Kirk, A Satheesh, S Slocum, LE McKinney, ... arXiv preprint arXiv:2502.05209, 2025 | 20 | 2025 |
| EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? A Agrawal, M Ding, Z Che, C Deng, A Satheesh, J Langford, F Huang arXiv preprint arXiv:2410.04571, 2024 | 14 | 2024 |
| Model Manipulation Attacks Enable More Rigorous Evaluations of LLM Capabilities Z Che, S Casper, A Satheesh, R Gandikota, D Rosati, S Slocum, ... Neurips Safe Generative AI Workshop 2024, 2024 | 9 | 2024 |
| AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security Z Cai, S Shabihi, B An, Z Che, BR Bartoldson, B Kailkhura, T Goldstein, ... arXiv preprint arXiv:2504.20965, 2025 | 8 | 2025 |
| Auction-Based Regulation for Artificial Intelligence M Bornstein, Z Che, S Julapalli, A Mohamed, AS Bedi, F Huang arXiv preprint arXiv:2410.01871, 2024 | 2 | 2024 |
| TutorBench: A Benchmark To Assess Tutoring Capabilities Of Large Language Models RS Srinivasa, Z Che, CBC Zhang, D Mares, E Hernandez, J Park, D Lee, ... arXiv preprint arXiv:2510.02663, 2025 | 1 | 2025 |
| EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles A Agrawal, M Ding, Z Che, C Deng, A Satheesh, B An, B Bruss, ... arXiv preprint arXiv:2505.21959, 2025 | 1 | 2025 |
| PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models MA Panaitescu-Liess, P Pathmanathan, Y Kaya, Z Che, B An, S Zhu, ... Neurips Safe Generative AI Workshop 2024, 2024 | 1 | 2024 |
| Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models B Wei, Z Che, N Li, UM Sehwag, J GÃķtting, S Nedungadi, J Michael, ... arXiv preprint arXiv:2510.27629, 2025 | | 2025 |
| Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics TT Nguyen, J Xu, Z Che, PA Nguyen-Le, R Dandamudi, D Braman, ... Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8 (2 …, 2025 | | 2025 |
| Advancing Regulation in Artificial Intelligence: An Auction-Based Approach M Bornstein, Z Che, S Julapalli, A Mohamed, AS Bedi, F Huang | | |