| Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi X Yue, Y Ni, K Zhang, T Zheng, R Liu, G Zhang, S Stevens, D Jiang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 1678 | 2024 |
| Mmmu-pro: A more robust multi-discipline multimodal understanding benchmark X Yue, T Zheng, Y Ni, Y Wang, K Zhang, S Tong, Y Sun, B Yu, G Zhang, ... Proceedings of the 63rd Annual Meeting of the Association for Computational …, 2025 | 232 | 2025 |
| Museformer: Transformer with fine-and coarse-grained attention for music generation B Yu, P Lu, R Wang, W Hu, X Tan, W Ye, S Zhang, T Qin, TY Liu Advances in Neural Information Processing Systems 35, 1376-1388, 2022 | 128 | 2022 |
| LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset B Yu, FN Baker, Z Chen, X Ning, H Sun arXiv preprint arXiv:2402.09391, 2024 | 105 | 2024 |
| ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Z Chen, S Chen, Y Ning, Q Zhang, B Wang, B Yu, Y Li, Z Liao, C Wei, Z Lu, ... arXiv preprint arXiv:2410.05080, 2024 | 102 | 2024 |
| MuseCoco: Generating Symbolic Music from Text P Lu, X Xu, C Kang, B Yu, C Xing, X Tan, J Bian arXiv preprint arXiv:2306.00110, 2023 | 82 | 2023 |
| Knowing False Negatives: An Adversarial Training Method for Distantly Supervised Relation Extraction K Hao, B Yu, W Hu arXiv preprint arXiv:2109.02099, 2021 | 26 | 2021 |
| MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks P Lu, X Tan, B Yu, T Qin, S Zhao, TY Liu arXiv preprint arXiv:2208.14345, 2022 | 23 | 2022 |
| Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge B Gou, Z Huang, Y Ning, Y Gu, M Lin, W Qi, A Kopanev, B Yu, ... arXiv preprint arXiv:2506.21506, 2025 | 18 | 2025 |
| Tooling or not tooling? the impact of tools on language agents for chemistry problem solving B Yu, FN Baker, Z Chen, G Herb, B Gou, D Adu-Ampratwum, X Ning, ... Findings of the Association for Computational Linguistics: NAACL 2025, 7620-7640, 2025 | 13 | 2025 |
| ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery, October 2024 Z Chen, S Chen, Y Ning, Q Zhang, B Wang, B Yu, Y Li, Z Liao, C Wei, Z Lu, ... URL https://arxiv. org/abs/2410.05080 v1, 2024 | 13 | 2024 |
| EmoGen: Eliminating Subjective Bias in Emotional Music Generation C Kang, P Lu, B Yu, X Tan, W Ye, S Zhang, J Bian arXiv preprint arXiv:2307.01229, 2023 | 9 | 2023 |
| Mind2web 2: Evaluating agentic search with agent-as-a-judge, 2025 B Gou, Z Huang, Y Ning, Y Gu, M Lin, W Qi, A Kopanev, B Yu, ... URL https://arxiv. org/abs/2506.21506, 2025 | 7 | 2025 |
| Chemtoolagent: The impact of tools on language agents for chemistry problem solving B Yu, FN Baker, Z Chen, G Herb, B Gou, D Adu-Ampratwum, X Ning, ... arXiv preprint arXiv:2411.07228, 2024 | 6 | 2024 |
| Larc: Towards human-level constrained retrosynthesis planning through an agentic framework FN Baker, D Adu-Ampratwum, R Averly, B Yu, H Sun, X Ning arXiv preprint arXiv:2508.11860, 2025 | 3 | 2025 |
| AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists Y Li, HN Moussa, Z Chen, S Chen, B Yu, M Xue, B Burns, TY Chiu, V Dey, ... arXiv preprint arXiv:2506.08140, 2025 | 3 | 2025 |
| Probing Association Biases in LLM Moderation Over-Sensitivity Y Wang, B Yu, I Yang, S Hassanpour, S Vosoughi arXiv preprint arXiv:2505.23914, 2025 | 1 | 2025 |
| Joint Reasoning of Events, Participants and Locations for Plot Relation Recognition S Qiu, B Yu, L Qian, Q Guo, W Hu Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint …, 2020 | 1 | 2020 |
| Evaluating Large Language Models in Scientific Discovery Z Song, J Lu, Y Du, B Yu, TM Pruyn, Y Huang, K Guo, X Luo, Y Qu, Y Qu, ... arXiv preprint arXiv:2512.15567, 2025 | | 2025 |
| AgentSearchBench: Evaluating Agentic Search with Agent-as-a-Judge B Gou, Z Huang, Y Ning, Y Gu, M Lin, B Yu, A Kopanev, W Qi, Y Shu, ... ICML 2025 Workshop on Computer Use Agents, 0 | | |