Is wider better? - Learning Mechanics

Open Question 3.4: Is wider better? Can it be shown that, when all hyperparameters are all optimally tuned, a wider MLP performs better on average on arbitrary tasks (perhaps under some reasonable assumptions on task structure)?

See question in context | See all open questions

This is a discussion page for the open question above. Feel free to share ideas, approaches, or relevant research in the comments below.

Discussion