Open Question 2.1

Convergence of wide $\mu$P networks.


Open Question 2.1: Convergence of wide $\mu$P networks. Under what conditions does a network in the infinite-width \(\mu\)P limit converge when optimized with gradient descent?

This is a discussion page for the open question above. Feel free to share ideas, approaches, or relevant research in the comments below.

Discussion