Open Question 2.1
Convergence of wide $\mu$P networks.
Open Question 2.1: Convergence of wide $\mu$P networks. Under what conditions does a network in the infinite-width \(\mu\)P limit converge when optimized with gradient descent?
This is a discussion page for the open question above. Feel free to share ideas, approaches, or relevant research in the comments below.
Discussion