Is richer better? - Learning Mechanics

Open Question 3.3: Is richer better? Research by [Atanasov et al. (2024)] finds that, in online training, networks with larger richness parameter \(\gamma\) generalize better (so long as they’re given enough training time to escape the initial plateau). Is this generally true? Why?

See question in context | See all open questions

This is a discussion page for the open question above. Feel free to share ideas, approaches, or relevant research in the comments below.

Discussion