Open Question 3.2

Scaling relationships for learning rate schedules.


Open Question 3.2: Scaling relationships for learning rate schedules. What scaling rules or relationships apply to learning rate schedules? What nondimensionalized quantities emerge? Can we “post-dict” properties of common learning rate schedules used in practice?

This is a discussion page for the open question above. Feel free to share ideas, approaches, or relevant research in the comments below.

Discussion