[go: up one dir, main page]

Follow
Carlos E. Jimenez
Carlos E. Jimenez
Verified email at princeton.edu - Homepage
Title
Cited by
Cited by
Year
Swe-bench: Can language models resolve real-world github issues?
CE Jimenez, J Yang, A Wettig, S Yao, K Pei, O Press, K Narasimhan
arXiv preprint arXiv:2310.06770, 2023
14722023
Swe-agent: Agent-computer interfaces enable automated software engineering
J Yang, CE Jimenez, A Wettig, K Lieret, S Yao, K Narasimhan, O Press
Advances in Neural Information Processing Systems 37, 50528-50652, 2024
7352024
Swe-bench multimodal: Do ai systems generalize to visual software domains?
J Yang, CE Jimenez, AL Zhang, K Lieret, J Yang, X Wu, O Press, ...
arXiv preprint arXiv:2410.03859, 2024
932024
Swe-bench: Can language models resolve real-world github issues?, 2024
CE Jimenez, J Yang, A Wettig, S Yao, K Pei, O Press, K Narasimhan
URL https://arxiv. org/abs/2310.06770 7, 2023
852023
Swe-smith: Scaling data for software engineering agents
J Yang, K Lieret, CE Jimenez, A Wettig, K Khandpur, Y Zhang, B Hui, ...
arXiv preprint arXiv:2504.21798, 2025
732025
C-STS: Conditional semantic textual similarity
A Deshpande, C Jimenez, H Chen, V Murahari, V Graf, T Rajpurohit, ...
Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023
422023
Introducing swe-bench verified
N Chowdhury, J Aung, CJ Shern, O Jaffe, D Sherburn, G Starace, E Mays, ...
arXiv preprint arXiv:2407.01489, 2024
382024
Datamux: Data multiplexing for neural networks
V Murahari, C Jimenez, R Yang, K Narasimhan
Advances in Neural Information Processing Systems 35, 17515-17527, 2022
292022
CARETS: A consistency and robustness evaluative test suite for VQA
CE Jimenez, O Russakovsky, K Narasimhan
arXiv preprint arXiv:2203.07613, 2022
202022
Introducing swe-bench verified, 2024
N Chowdhury, J Aung, CJ Shern, O Jaffe, D Sherburn, G Starace, E Mays, ...
URL https://openai. com/index/introducing-swe-bench-verified, 2024
192024
Enigma: Enhanced interactive generative model agent for ctf challenges
T Abramovich, M Udeshi, M Shao, K Lieret, H Xi, K Milner, S Jancheska, ...
arXiv preprint arXiv:2409.16165, 2024
112024
Mux-plms: Pre-training language models with data multiplexing
V Murahari, A Deshpande, C Jimenez, I Shafran, M Wang, Y Cao, ...
Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP …, 2023
82023
Swe-bench: Can language models resolve real-world github issues? CoRR, abs/2310.06770, 2023. doi: 10.48550
CE Jimenez, J Yang, A Wettig, S Yao, K Pei, O Press, K Narasimhan
arXiv preprint ARXIV.2310.06770 10, 0
8
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Q Shi, CE Jimenez, S Yao, N Haber, D Yang, K Narasimhan
arXiv preprint arXiv:2506.05579, 2025
42025
Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
T Abramovich, M Udeshi, M Shao, K Lieret, H Xi, K Milner, S Jancheska, ...
arXiv preprint arXiv:2409.16165, 2024
32024
Mux-plms: Data multiplexing for high-throughput language models
V Murahari, A Deshpande, C Jimenez, I Shafran, M Wang, Y Cao, ...
Findings of the Association for Computational Linguistics: EMNLP 2023, 4540-4554, 2023
32023
EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
T Abramovich, M Udeshi, M Shao, K Lieret, H Xi, K Milner, S Jancheska, ...
Forty-second International Conference on Machine Learning, 0
2
CodeClash: Benchmarking Goal-Oriented Software Engineering
J Yang, K Lieret, J Yang, CE Jimenez, O Press, L Schmidt, D Yang
arXiv preprint arXiv:2511.00839, 2025
12025
IMPersona: Evaluating Individual Level LM Impersonation
Q Shi, CE Jimenez, S Dong, B Seo, C Yao, A Kelch, K Narasimhan
arXiv preprint arXiv:2504.04332, 2025
12025
Learning physical commonsense knowledge
CE Jimenez
12020
The system can't perform the operation now. Try again later.
Articles 1–20