Backpropagation – Shameful Blog

We train Neural Nets via backpropagation. While the details are complex (and usually handled by our machine learning software), it’s important to note that it works by making gradient descent possible on multi-layered neural networks.

There are problems that can arise in backpropagation that you need to be aware of. Specifically:

Vanishing Gradients

If gradients for lower layers (the ones nearer the input) become very small, they train very slowly. Using ReLU activation can help prevent vanishing gradients

Dead ReLU Units

If the weighted sum for a ReLU unit drops below 0, it can get stuck at zero. Lowering the learning rate can prevent dead ReLU units

Exploding Gradients

If weights are large, gradients for lower layers can “explode”, or get too large to converge. Batch normalization and lowering the learning rate help prevent exploding gradients

Dropout Regularization

Dropout regularization is a useful regularization method that deliberately drops unit activations for a single gradient. This can be particularly helpful with complex networks

Vanishing Gradients

Dead ReLU Units

Exploding Gradients

Dropout Regularization

Leave a Comment Cancel Reply