[go: up one dir, main page]

Follow
Amirkeivan Mohtashami
Title
Cited by
Cited by
Year
Meditron-70b: Scaling medical pretraining for large language models
Z Chen, AH Cano, A Romanou, A Bonnet, K Matoba, F Salvi, ...
arXiv preprint arXiv:2311.16079, 2023
5572023
Quarot: Outlier-free 4-bit inference in rotated llms
S Ashkboos, A Mohtashami, ML Croci, B Li, P Cameron, M Jaggi, ...
Advances in Neural Information Processing Systems 37, 100213-100240, 2024
3432024
Landmark Attention: Random-Access Infinite Context Length for Transformers
A Mohtashami, M Jaggi
Advances in Neural Information Processing Systems (NeurIPS) 2023, 2023
225*2023
Masked Training of Neural Networks with Partial Gradients
A Mohtashami, M Jaggi, SU Stich
The 25th International Conference on Artificial Intelligence and Statistics, 2021
49*2021
Critical parameters for scalable distributed learning with large batches and asynchronous updates
S Stich, A Mohtashami, M Jaggi
International Conference on Artificial Intelligence and Statistics, 4042-4050, 2021
242021
Special Properties of Gradient Descent with Large Learning Rates
A Mohtashami, M Jaggi, S Stich
ICML 2023, 2022
20*2022
The splay-list: A distribution-adaptive concurrent skip-list
V Aksenov, D Alistarh, A Drozdova, A Mohtashami
34th International Symposium on Distributed Computing 179, 2020
192020
Characterizing & finding good data orderings for fast convergence of sequential gradient methods
A Mohtashami, S Stich, M Jaggi
arXiv preprint arXiv:2202.01838, 2022
172022
Denseformer: Enhancing information flow in transformers via depth weighted averaging
M Pagliardini, A Mohtashami, F Fleuret, M Jaggi
Advances in neural information processing systems 37, 136479-136508, 2024
152024
Social Learning: Towards Collaborative Learning with Large Language Models
A Mohtashami, F Hartmann, S Gooding, L Zilka, M Sharifi, ...
arXiv preprint arXiv:2312.11441, 2023
142023
Cotformer: A chain-of-thought driven architecture with budget-adaptive computation cost at inference
A Mohtashami, M Pagliardini, M Jaggi
arXiv preprint arXiv:2310.10845, 2023
132023
CoTFormer: More Tokens With Attention Make Up For Less Depth
A Mohtashami, M Pagliardini, M Jaggi
Workshop on Advancing Neural Network Training @ NeurIPS 2023, 2023
72023
Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
A Mohtashami, M Verzetti, PK Rubenstein
Practical ML for Developing Countries Workshop @ ICLR 2023, 2023
72023
Meditron: Open medical foundation models adapted for clinical practice
Z Chen, A Romanou, A Bonnet, A Hernández-Cano, B Alkhamissi, ...
62024
TPS (task preparation system): A tool for developing tasks in programming contests
K MIRJALALI, AK MOHTASHAMI, M ROGHANI, H ZARRABI-ZADEH
Olympiads in Informatics 13, 209-215, 2019
12019
Reproducibility Report for "On Warm-Starting Neural Network Training"
A Mohtashami, E Pajouheshgar, K Kireev
ML Reproducibility Challenge 2020, 2021
2021
34th International Symposium on Distributed Computing (DISC 2020)
S Assadi, A Bernstein, Z Langley, A Rinberg, I Keidar, V Aksenov, ...
Schloss Dagstuhl-Leibniz-Zentrum für Informatik GmbH, 2020
2020
LIPIcs, Volume 179, DISC 2020, Complete Volume}}
H Attiya, S Assadi, A Bernstein, Z Langley, A Rinberg, I Keidar, V Aksenov, ...
34th International Symposium on Distributed Computing (DISC 2020) 179, 0, 2020
2020
A Gradient-Based Approach to Neural Networks Structure Learning
AA Moinfar, A Mohtashami, M Soleymani, A Sharifi-Zarchi
The system can't perform the operation now. Try again later.
Articles 1–19