[go: up one dir, main page]

Follow
Yuanzhi Li
Yuanzhi Li
Assistant Professor at CMU
Verified email at andrew.cmu.edu - Homepage
Title
Cited by
Cited by
Year
Lora: Low-rank adaptation of large language models.
EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang, L Wang, W Chen
ICLR 1 (2), 3, 2022
25622*2022
Sparks of artificial general intelligence: Early experiments with gpt-4
S Bubeck, V Chandrasekaran, R Eldan, J Gehrke, E Horvitz, E Kamar, ...
arXiv preprint arXiv:2303.12712, 2023
54892023
Phi-4 technical report
M Abdin, J Aneja, H Behl, S Bubeck, R Eldan, S Gunasekar, M Harrison, ...
arXiv preprint arXiv:2412.08905, 2024
27612024
A convergence theory for deep learning via over-parameterization
Z Allen-Zhu, Y Li, Z Song
International conference on machine learning, 242-252, 2019
19852019
Learning and generalization in overparameterized neural networks, going beyond two layers
Z Allen-Zhu, Y Li, Y Liang
Advances in neural information processing systems 32, 2019
10392019
A theoretical analysis of NDCG type ranking measures
Y Wang, L Wang, Y Li, D He, TY Liu
Conference on learning theory, 25-54, 2013
9532013
Textbooks are all you need
S Gunasekar, Y Zhang, J Aneja, CCT Mendes, A Del Giorno, S Gopi, ...
arXiv preprint arXiv:2306.11644, 2023
9522023
Convergence analysis of two-layer neural networks with relu activation
Y Li, Y Yuan
Advances in neural information processing systems 30, 2017
8892017
Learning overparameterized neural networks via stochastic gradient descent on structured data
Y Li, Y Liang
Advances in neural information processing systems 31, 2018
8382018
Textbooks are all you need ii: phi-1.5 technical report
Y Li, S Bubeck, R Eldan, A Del Giorno, S Gunasekar, YT Lee
arXiv preprint arXiv:2309.05463, 2023
6952023
A latent variable model approach to pmi-based word embeddings
S Arora, Y Li, Y Liang, T Ma, A Risteski
Transactions of the Association for Computational Linguistics 4, 385-399, 2016
694*2016
Towards understanding ensemble, knowledge distillation and self-distillation in deep learning
Z Allen-Zhu, Y Li
arXiv preprint arXiv:2012.09816, 2020
6192020
Can generalist foundation models outcompete special-purpose tuning? case study in medicine
H Nori, YT Lee, S Zhang, D Carignan, R Edgar, N Fusi, N King, J Larson, ...
arXiv preprint arXiv:2311.16452, 2023
5352023
Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions
S Chen, S Chewi, J Li, Y Li, A Salim, AR Zhang
arXiv preprint arXiv:2209.11215, 2022
4812022
Towards explaining the regularization effect of initial large learning rate in training neural networks
Y Li, C Wei, T Ma
Advances in neural information processing systems 32, 2019
4322019
An alternative view: When does SGD escape local minima?
B Kleinberg, Y Li, Y Yuan
International conference on machine learning, 2698-2707, 2018
4302018
Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations
Y Li, T Ma, H Zhang
Conference On Learning Theory, 2-47, 2018
4252018
Gradient descent on neural networks typically occurs at the edge of stability
JM Cohen, S Kaur, Y Li, JZ Kolter, A Talwalkar
arXiv preprint arXiv:2103.00065, 2021
4192021
Phi-2: The surprising power of small language models
M Javaheripi, S Bubeck, M Abdin, J Aneja, S Bubeck, CCT Mendes, ...
Microsoft Research Blog 1 (3), 3, 2023
4012023
Tinystories: How small can language models be and still speak coherent english?
R Eldan, Y Li
arXiv preprint arXiv:2305.07759, 2023
3802023
The system can't perform the operation now. Try again later.
Articles 1–20