We train Neural Nets via backpropagation. While the details are complex (and usually handled by our machine learning software), it’s important to note that it works by making gradient descent possible on multi-layered neural networks.
There are problems that can arise in backpropagation that you need to be aware of. Specifically:
Vanishing Gradients
If gradients for lower layers (the ones nearer the input) become very small, they train very slowly. Using ReLU activation can help prevent vanishing gradients
Dead ReLU Units
If the weighted sum for a ReLU unit drops below 0, it can get stuck at zero. Lowering the learning rate can prevent dead ReLU units
Exploding Gradients
If weights are large, gradients for lower layers can “explode”, or get too large to converge. Batch normalization and lowering the learning rate help prevent exploding gradients
Dropout Regularization
Dropout regularization is a useful regularization method that deliberately drops unit activations for a single gradient. This can be particularly helpful with complex networks