ReLU has information blocking problems. You can say approximately 50% of the time it is off (f(x)=0.)
Information about the input gets lost before it can be fully used, information that could be used to construct the output is intercepted before it can get to the endpoint.
ResNet is one solution to the information flow problem.
Another solution is to go slightly beyond the activation function concept:
https://discourse.processing.org/t/relu-is-half-a-cookie/32134
The idea is to double the number of weights in the neural network.
Giving each neuron 2 different forward connected weight vectors, rather than the traditional 1 forward connected weight vector.
The activation function then is only ever f(x)=x (do nothing.)
Instead x is used to select between forward connected weight vectors.
if(x>=0){
use forward connected weight vector A
} else {
use forward connected weight vector B
}
The behavior is the same as ReLU when x>=0.