- Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
We prove a convergence theory for constant large learning rates well beyond 2 L, where L is the largest eigenvalue of Hessian at the initialization
- GitHub Pages - Yuqing Wang
My research lies at the intersection of machine learning and applied math, combining tools from optimization, (stochastic) dynamics, computational math, analysis, topology, and sampling
- Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Recent empirical advances show that training deep models with large learning rate often improves generalization performance However, theoretical justifications on the benefits of large
- Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect . . .
Abstract: Recent empirical advances show that training deep models with large learning rate often improves generalization performance However, theoretical justifications on the benefits of large learning rate are highly limited, due to challenges in analysis
- dblp: Large Learning Rate Tames Homogeneity: Convergence and Balancing . . .
Bibliographic details on Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
- arXiv:2110. 03677v1 [cs. LG] 7 Oct 2021
fywang3398, mchen393, tourzhao, mtaog@gatech edu Abstract improves generalization performance However, theoretical justi cations on the bene ts of large learning rate are highly l mited, due to challenges in analysis In this paper, we consider using Gradient Descent (GD) with a large learning rate on a homogeneous matrix factorizat
- Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect . . .
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Award ID (s): 1847802 PAR ID: 10335984 Author (s) Creator (s): Wang, Yuqing; Chen, Minshuo; Zhao, Tuo; Tao, Molei Date Published: 2022-01-01 Journal Name: The International Conference on Learning Representations Format (s): Medium: X Sponsoring Org: National Science
- LARGE LEARNING RATE TAMES HOMOGENEITY CONVERGENCE AND BALANCING EFFECT
large learning rate are highly limited, due to challenges in analysis In this paper, we consider using Gradient Descent (GD) with a large learning rate o a homogeneous matrix factorization problem, i e , minX;Y kA XY >k2 F We prove a convergence theory for constant large learning rates well beyond 2
|