When to use BatchNorm?

BatchNorm - after or before activation function? Different papers use different setup, I have a feeling that using BN after activation helps the next layer understand the input better, because it’s not been put through any additional function.

1 Like