Open Question 3.3

Is richer better?


Open Question 3.3: Is richer better? Research by [Atanasov et al. (2024)] finds that, in online training, networks with larger richness parameter \(\gamma\) generalize better (so long as they’re given enough training time to escape the initial plateau). Is this generally true? Why?

This is a discussion page for the open question above. Feel free to share ideas, approaches, or relevant research in the comments below.

Discussion