| Meditron-70b: Scaling medical pretraining for large language models Z Chen, AH Cano, A Romanou, A Bonnet, K Matoba, F Salvi, ... arXiv preprint arXiv:2311.16079, 2023 | 557 | 2023 |
| Quarot: Outlier-free 4-bit inference in rotated llms S Ashkboos, A Mohtashami, ML Croci, B Li, P Cameron, M Jaggi, ... Advances in Neural Information Processing Systems 37, 100213-100240, 2024 | 343 | 2024 |
| Landmark Attention: Random-Access Infinite Context Length for Transformers A Mohtashami, M Jaggi Advances in Neural Information Processing Systems (NeurIPS) 2023, 2023 | 225* | 2023 |
| Masked Training of Neural Networks with Partial Gradients A Mohtashami, M Jaggi, SU Stich The 25th International Conference on Artificial Intelligence and Statistics, 2021 | 49* | 2021 |
| Critical parameters for scalable distributed learning with large batches and asynchronous updates S Stich, A Mohtashami, M Jaggi International Conference on Artificial Intelligence and Statistics, 4042-4050, 2021 | 24 | 2021 |
| Special Properties of Gradient Descent with Large Learning Rates A Mohtashami, M Jaggi, S Stich ICML 2023, 2022 | 20* | 2022 |
| The splay-list: A distribution-adaptive concurrent skip-list V Aksenov, D Alistarh, A Drozdova, A Mohtashami 34th International Symposium on Distributed Computing 179, 2020 | 19 | 2020 |
| Characterizing & finding good data orderings for fast convergence of sequential gradient methods A Mohtashami, S Stich, M Jaggi arXiv preprint arXiv:2202.01838, 2022 | 17 | 2022 |
| Denseformer: Enhancing information flow in transformers via depth weighted averaging M Pagliardini, A Mohtashami, F Fleuret, M Jaggi Advances in neural information processing systems 37, 136479-136508, 2024 | 15 | 2024 |
| Social Learning: Towards Collaborative Learning with Large Language Models A Mohtashami, F Hartmann, S Gooding, L Zilka, M Sharifi, ... arXiv preprint arXiv:2312.11441, 2023 | 14 | 2023 |
| Cotformer: A chain-of-thought driven architecture with budget-adaptive computation cost at inference A Mohtashami, M Pagliardini, M Jaggi arXiv preprint arXiv:2310.10845, 2023 | 13 | 2023 |
| CoTFormer: More Tokens With Attention Make Up For Less Depth A Mohtashami, M Pagliardini, M Jaggi Workshop on Advancing Neural Network Training @ NeurIPS 2023, 2023 | 7 | 2023 |
| Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models A Mohtashami, M Verzetti, PK Rubenstein Practical ML for Developing Countries Workshop @ ICLR 2023, 2023 | 7 | 2023 |
| Meditron: Open medical foundation models adapted for clinical practice Z Chen, A Romanou, A Bonnet, A Hernández-Cano, B Alkhamissi, ... | 6 | 2024 |
| TPS (task preparation system): A tool for developing tasks in programming contests K MIRJALALI, AK MOHTASHAMI, M ROGHANI, H ZARRABI-ZADEH Olympiads in Informatics 13, 209-215, 2019 | 1 | 2019 |
| Reproducibility Report for "On Warm-Starting Neural Network Training" A Mohtashami, E Pajouheshgar, K Kireev ML Reproducibility Challenge 2020, 2021 | | 2021 |
| 34th International Symposium on Distributed Computing (DISC 2020) S Assadi, A Bernstein, Z Langley, A Rinberg, I Keidar, V Aksenov, ... Schloss Dagstuhl-Leibniz-Zentrum für Informatik GmbH, 2020 | | 2020 |
| LIPIcs, Volume 179, DISC 2020, Complete Volume}} H Attiya, S Assadi, A Bernstein, Z Langley, A Rinberg, I Keidar, V Aksenov, ... 34th International Symposium on Distributed Computing (DISC 2020) 179, 0, 2020 | | 2020 |
| A Gradient-Based Approach to Neural Networks Structure Learning AA Moinfar, A Mohtashami, M Soleymani, A Sharifi-Zarchi | | |