| VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu, A Agarwal, Z Chen, M Li, ... arXiv preprint arXiv:2407.11691, 2024 | 336 | 2024 |
| Large language model is not a good few-shot information extractor, but a good reranker for hard samples! Y Ma, Y Cao, YC Hong, A Sun EMNLP 2023 (Findings), 2023 | 245 | 2023 |
| Prompt for Extraction? PAIE: Prompting Argument Interaction for Event Argument Extraction Y Ma, Z Wang, Y Cao, M Li, M Chen, K Wang, J Shao ACL 2022, 2022 | 204 | 2022 |
| MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations Y Ma, Y Zang, L Chen, M Chen, Y Jiao, X Li, X Lu, Z Liu, Y Ma, X Dong, ... NeurIPS 2024 spotlight (Dataset and Benchmark), 2024 | 103 | 2024 |
| SciAgent: Tool-augmented Language Models for Scientific Reasoning Y Ma, Z Gou, J Hao, R Xu, S Wang, L Pan, Y Yang, Y Cao, A Sun, ... EMNLP 2024, 2024 | 74 | 2024 |
| Towards verifiable generation: A benchmark for knowledge-aware language model attribution X Li, Y Cao, L Pan, Y Ma, A Sun ACL 2024 (Findings), 2023 | 33 | 2023 |
| InternLM-XComposer2. 5-Reward: A Simple Yet Effective Multi-Modal Reward Model Y Zang, X Dong, P Zhang, Y Cao, Z Liu, S Ding, S Wu, Y Ma, H Duan, ... ACL 2025 (Findings), 2025 | 32 | 2025 |
| Learning to teach large language models logical reasoning M Chen, Y Ma, K Song, Y Cao, Y Zhang, D Li ACL 2024, 2023 | 32* | 2023 |
| Toward generalizable evaluation in the llm era: A survey beyond benchmarks Y Cao, S Hong, X Li, J Ying, Y Ma, H Liang, Y Liu, Z Yao, X Wang, ... arXiv preprint arXiv:2504.18838, 2025 | 25 | 2025 |
| Antileak-bench: Preventing data contamination by automatically constructing benchmarks with updated real-world knowledge X Wu, L Pan, Y Xie, R Zhou, S Zhao, Y Ma, M Du, R Mao, AT Luu, ... ACL 2025, 2024 | 24 | 2024 |
| Long context vs. rag for llms: An evaluation and revisits X Li, Y Cao, Y Ma, A Sun arXiv preprint arXiv:2501.01880, 2024 | 22 | 2024 |
| MMEKG: Multi-modal Event Knowledge Graph towards universal representation across modalities Y Ma, Z Wang, M Li, Y Cao, M Chen, X Li, W Sun, K Deng, K Wang, A Sun, ... ACL 2022 (System Demonstration Track), 2022 | 21 | 2022 |
| Information extraction in low-resource scenarios: Survey and perspective S Deng, Y Ma, N Zhang, Y Cao, B Hooi 2024 IEEE International Conference on Knowledge Graph (ICKG), 33-49, 2024 | 20 | 2024 |
| Few-shot Event Detection: An Empirical Study and a Unified View Y Ma, Z Wang, Y Cao, A Sun ACL 2023, 2023 | 20 | 2023 |
| Tart: An open-source tool-augmented framework for explainable table-based reasoning X Lu, L Pan, Y Ma, P Nakov, MY Kan Findings of the Association for Computational Linguistics: NAACL 2025, 4323-4339, 2025 | 11 | 2025 |
| Navigating the nuances: A fine-grained evaluation of vision-language navigation Z Wang, M Wu, Y Cao, Y Ma, M Chen, T Tuytelaars EMNLP 2024 (Findings), 2024 | 7 | 2024 |
| MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation X Li, K Bao, Y Ma, M Li, W Wang, R Men, Y Zhang, F Feng, D Liu, J Lin arXiv preprint arXiv:2505.17123, 2025 | 6 | 2025 |
| Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings Y Ma, J Li, Y Zang, X Wu, X Dong, P Zhang, Y Cao, H Duan, J Wang, ... ACL 2025 (Findings), 2025 | 3 | 2025 |
| Effieval: Efficient and generalizable model evaluation via capability coverage maximization Y Wang, J Ying, Y Cao, Y Ma, Y Jiang arXiv preprint arXiv:2508.09662, 2025 | 2 | 2025 |
| Synergistic Weak-Strong Collaboration by Aligning Preferences Y Jiao, X Zhang, Z Wang, Y Ma, Z Deng, R Wang, C Bansal, S Rajmohan, ... ACL 2025, 2025 | | 2025 |