| Phi-3 technical report: A highly capable language model locally on your phone, 2024 M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ... URL https://arxiv. org/abs/2404.14219 2, 6, 2024 | 2958* | 2024 |
| How good are gpt models at machine translation? a comprehensive evaluation A Hendy, M Abdelrehim, A Sharaf, V Raunak, M Gabr, H Matsushita, ... arXiv preprint arXiv:2302.09210, 2023 | 657 | 2023 |
| Contrastive preference optimization: Pushing the boundaries of llm performance in machine translation H Xu, A Sharaf, Y Chen, W Tan, L Shen, B Van Durme, K Murray, YJ Kim arXiv preprint arXiv:2401.08417, 2024 | 398 | 2024 |
| A paradigm shift in machine translation: Boosting translation performance of large language models H Xu, YJ Kim, A Sharaf, HH Awadalla arXiv preprint arXiv:2309.11674, 2023 | 267 | 2023 |
| Prediction of weather-induced airline delays based on machine learning algorithms S Choi, YJ Kim, S Briceno, D Mavris Digital Avionics Systems Conference (DASC), 2016 IEEE/AIAA 35th, 2016 | 261 | 2016 |
| A deep learning approach to flight delay prediction YJ Kim, S Choi, S Briceno, D Mavris 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC), 2016 | 231 | 2016 |
| Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras A Abouelenin, A Ashfaq, A Atkinson, H Awadalla, N Bach, J Bao, ... arXiv preprint arXiv:2503.01743, 2025 | 219 | 2025 |
| Taming sparsely activated transformer with stochastic experts S Zuo, X Liu, J Jiao, YJ Kim, H Hassan, R Zhang, T Zhao, J Gao arXiv preprint arXiv:2110.04260, 2021 | 175 | 2021 |
| Scalable and efficient moe training for multitask multilingual models YJ Kim, AA Awan, A Muzio, AFC Salinas, L Lu, A Hendy, S Rajbhandari, ... arXiv preprint arXiv:2109.10465, 2021 | 114 | 2021 |
| Lower numerical precision deep learning inference and training A Rodriguez, E Segal, E Meiri, E Fomenko, YJ Kim, H Shen, B Ziv https://software.intel.com/en-us/articles/lower-numerical-precision-deep …, 2018 | 82 | 2018 |
| From Research to Production and Back: Ludicrously Fast Neural Machine Translation YJ Kim, M Junczys-Dowmunt, H Hassan, AF Aji, K Heafield, ... Proceedings of the 3rd Workshop on Neural Generation and Translation, 280-288, 2019 | 70 | 2019 |
| FastFormers: Highly efficient transformer models for natural language understanding YJ Kim, HH Awadalla arXiv preprint arXiv:2010.13382, 2020 | 65 | 2020 |
| Artificial neural network models for airport capacity prediction S Choi, YJ Kim Journal of Air Transport Management 97, 102146, 2021 | 55 | 2021 |
| Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production YJ Kim, R Henry, R Fahim, HH Awadalla arXiv preprint arXiv:2211.10017, 2022 | 32 | 2022 |
| Finequant: Unlocking efficiency with fine-grained weight-only quantization for llms YJ Kim, R Henry, R Fahim, HH Awadalla arXiv preprint arXiv:2308.09723, 2023 | 29 | 2023 |
| Gating dropout: Communication-efficient regularization for sparsely activated transformers R Liu, YJ Kim, A Muzio, H Hassan International Conference on Machine Learning, 13782-13792, 2022 | 27 | 2022 |
| Mixture of quantized experts (moqe): Complementary effect of low-bit quantization and robustness YJ Kim, R Fahim, HH Awadalla arXiv preprint arXiv:2310.02410, 2023 | 26 | 2023 |
| Phi-4-mini-reasoning: Exploring the limits of small reasoning language models in math H Xu, B Peng, H Awadalla, D Chen, YC Chen, M Gao, YJ Kim, Y Li, L Ren, ... arXiv preprint arXiv:2504.21233, 2025 | 24 | 2025 |
| Cost-sensitive prediction of airline delays using machine learning S Choi, YJ Kim, S Briceno, D Mavris 2017 IEEE/AIAA 36th Digital Avionics Systems, 2017 | 21 | 2017 |
| Automoe: Heterogeneous mixture-of-experts with adaptive computation for efficient neural machine translation G Jawahar, S Mukherjee, X Liu, YJ Kim, M Abdul-Mageed, L Lakshmanan, ... Findings of the Association for Computational Linguistics: ACL 2023, 9116-9132, 2023 | 17 | 2023 |