[go: up one dir, main page]

Follow
Xudong Han
Xudong Han
LibrAI & MBZUAI
Verified email at mbzuai.ac.ae - Homepage
Title
Cited by
Cited by
Year
Do-not-answer: A dataset for evaluating safeguards in llms
Y Wang, H Li, X Han, P Nakov, T Baldwin
arXiv preprint arXiv:2308.13387, 2023
322*2023
Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models
N Sengupta, SK Sahu, B Jia, S Katipomu, H Li, F Koto, W Marshall, ...
arXiv preprint arXiv:2308.16149, 2023
186*2023
Diverse adversaries for mitigating bias in training
X Han, T Baldwin, T Cohn
arXiv preprint arXiv:2101.10001, 2021
862021
Balancing out bias: Achieving fairness through balanced training
X Han, T Baldwin, T Cohn
arXiv preprint arXiv:2109.08253, 2021
76*2021
Evaluating debiasing techniques for intersectional biases
S Subramanian, X Han, T Baldwin, T Cohn, L Frermann
arXiv preprint arXiv:2109.10441, 2021
682021
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang, J Gao, Y Zhang, W Che, ...
Journal of Artificial Intelligence Research 82, 687-775, 2025
532025
Safety at scale: A comprehensive survey of large model safety
X Ma, Y Gao, Y Wang, R Wang, X Wang, Y Sun, Y Ding, H Xu, Y Chen, ...
arXiv preprint arXiv:2502.05206, 2025
522025
Contrastive learning for fair representations
A Shen, X Han, T Cohn, T Baldwin, L Frermann
arXiv preprint arXiv:2109.10645, 2021
442021
Learning from failure: Integrating negative examples when fine-tuning large language models as agents
R Wang, H Li, X Han, Y Zhang, T Baldwin
arXiv preprint arXiv:2402.11651, 2024
422024
Optimising equal opportunity fairness in model training
A Shen, X Han, T Cohn, T Baldwin, L Frermann
arXiv preprint arXiv:2205.02393, 2022
392022
Toolgen: Unified tool retrieval and calling via generation
R Wang, X Han, L Ji, S Wang, T Baldwin, H Li
arXiv preprint arXiv:2410.03439, 2024
362024
FairLib: A unified framework for assessing and improving fairness
X Han, A Shen, Y Li, L Frermann, T Baldwin, T Cohn
Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022
30*2022
A Chinese dataset for evaluating the safeguards in large language models
Y Wang, Z Zhai, H Li, X Han, S Lin, Z Zhang, A Zhao, P Nakov, T Baldwin
Findings of the Association for Computational Linguistics: ACL 2024, 3106-3119, 2024
282024
Decoupling Adversarial Training for Fair NLP
X Han, T Baldwin, T Cohn
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021
282021
Does representational fairness imply empirical fairness?
A Shen, X Han, T Cohn, T Baldwin, L Frermann
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022 …, 2022
252022
Towards equal opportunity fairness through adversarial learning
X Han, T Baldwin, T Cohn
arXiv preprint arXiv:2203.06317, 2022
202022
Loki: An open-source tool for fact verification
H Li, X Han, H Wang, Y Wang, M Wang, R Xing, Y Geng, Z Zhai, P Nakov, ...
Proceedings of the 31st International Conference on Computational …, 2025
172025
Fair enough: Standardizing evaluation and model selection for fairness research in NLP
X Han, T Baldwin, T Cohn
arXiv preprint arXiv:2302.05711, 2023
172023
Ailuminate: Introducing v1. 0 of the ai risk and reliability benchmark from mlcommons
S Ghosh, H Frase, A Williams, S Luger, P Röttger, F Barez, S McGregor, ...
arXiv preprint arXiv:2503.05731, 2025
152025
Systematic evaluation of predictive fairness
X Han, A Shen, T Cohn, T Baldwin, L Frermann
arXiv preprint arXiv:2210.08758, 2022
142022
The system can't perform the operation now. Try again later.
Articles 1–20