Why ReLU? - Learning Mechanics

Open Question 3.12: Why ReLU? Why is ReLU close to the optimal activation function for most deep learning applications? A scientific answer to this question should include calculations and convincing experiments that make the case.

See question in context | See all open questions

This is a discussion page for the open question above. Feel free to share ideas, approaches, or relevant research in the comments below.

Discussion