[go: up one dir, main page]

Follow
Jingfeng Wu
Title
Cited by
Cited by
Year
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Z Zhu, J Wu, B Yu, L Wu, J Ma
International Conference on Machine Learning, 7654-7663, 2019
339*2019
On the Noisy Gradient Descent that Generalizes as SGD
J Wu, W Hu, H Xiong, J Huan, V Braverman, Z Zhu
International Conference on Machine Learning, 10367-10376, 2020
1402020
Programmable packet scheduling with a single queue
Z Yu, C Hu, J Wu, X Sun, V Braverman, M Chowdhury, Z Liu, X Jin
Proceedings of the 2021 ACM SIGCOMM 2021 Conference, 179-193, 2021
1332021
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
J Wu, D Zou, Z Chen, V Braverman, Q Gu, PL Bartlett
arXiv preprint arXiv:2310.08391, 2023
1082023
Benign overfitting of constant-stepsize SGD for linear regression
D Zou, J Wu, V Braverman, Q Gu, SM Kakade
Journal of Machine Learning Research 24 (326), 1-58, 2023
1082023
Twenty years after: Hierarchical {Core-Stateless} fair queueing
Z Yu, J Wu, V Braverman, I Stoica, X Jin
18th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2021
602021
The benefits of implicit regularization from sgd in least squares problems
D Zou, J Wu, V Braverman, Q Gu, DP Foster, S Kakade
Advances in neural information processing systems 34, 5456-5468, 2021
512021
A collective AI via lifelong learning and sharing at the edge
A Soltoggio, E Ben-Iwhiwhu, V Braverman, E Eaton, B Epstein, Y Ge, ...
Nature Machine Intelligence 6 (3), 251-264, 2024
502024
Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate
J Wu, D Zou, V Braverman, Q Gu
International Conference on Learning Representations, 2021
482021
Scaling laws in linear regression: Compute, parameters, and data
L Lin, J Wu, SM Kakade, PL Bartlett, JD Lee
arXiv preprint arXiv:2406.08466, 2024
442024
Implicit bias of gradient descent for logistic regression at the edge of stability
J Wu, V Braverman, JD Lee
Advances in Neural Information Processing Systems 36, 74229-74256, 2023
422023
Last iterate risk bounds of sgd with decaying stepsize for overparameterized linear regression
J Wu, D Zou, V Braverman, Q Gu, S Kakade
International conference on machine learning, 24280-24314, 2022
422022
Tangent-normal adversarial regularization for semi-supervised learning
B Yu, J Wu, J Ma, Z Zhu
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2019
392019
Ship compute or ship data? why not both?
J You, J Wu, X Jin, M Chowdhury
18th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2021
382021
Fixed design analysis of regularization-based continual learning
H Li, J Wu, V Braverman
Conference on lifelong learning agents, 513-533, 2023
332023
Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency
J Wu, PL Bartlett, M Telgarsky, B Yu
The Thirty Seventh Annual Conference on Learning Theory, 5019-5073, 2024
322024
In-context learning of a linear transformer block: Benefits of the mlp component and one-step gd initialization
R Zhang, J Wu, P Bartlett
Advances in Neural Information Processing Systems 37, 18310-18361, 2024
312024
The power and limitation of pretraining-finetuning for linear regression under covariate shift
J Wu, D Zou, V Braverman, Q Gu, S Kakade
Advances in Neural Information Processing Systems 35, 33041-33053, 2022
312022
How Does Critical Batch Size Scale in Pre-training?
H Zhang, D Morwani, N Vyas, J Wu, D Zou, U Ghai, D Foster, S Kakade
arXiv preprint arXiv:2410.21676, 2024
272024
Large stepsize gradient descent for non-homogeneous two-layer networks: Margin improvement and fast optimization
Y Cai, J Wu, S Mei, M Lindsey, P Bartlett
Advances in Neural Information Processing Systems 37, 71306-71351, 2024
212024
The system can't perform the operation now. Try again later.
Articles 1–20