Deep learning in Python
Chollet (2021)
This book gives a clear and practical high-level overview of a range of topics. It has particularly good coverage of how you actually build the models in practice.
Topics covered include:
Basics of machine learning
Simple neural networks
Keras
Convolutional networks
Recurrent networks
Transformers
Generative deep learning
The last section is pretty shallow and dated in terms of the topics covered, so Generative deep learning was a good complement.
I've left the more extensive notes for Understanding deep learning as that goes into much greater depth about why things work.
Selected notes
Deep learning primarily took off due to better performance than classical approaches, but reducing the need for feature engineering is a big advantage too.
Optimizers that make use of momentum tend to converge faster and are less likely to get stuck in local minima.
Backpropagation is just the chain rule (though some optimizations are applied in practice).
You should generally apply feature-wise normalization to your data.
The manifold hypothesis posits that all natural data lies on a low-dimensional manifold within the high-dimensional space where it is encoded.
It often makes sense to train your model until it overfits in order to determine the best number of epochs to train for. If you can't get your model to overfit, it probably needs more capacity.
In the context of neural networks, weight decay is another term for L2 regularization.
Dropout rates are usually set between 0.2 and 0.5.
You can reduce the size of your model by pruning weights that don't have much impact on the output. Quantization is another option.
Convolutional layers are useful to learn local patterns. Stacking them can allow the model to learn spatial hierarchies.
You can perform feature extraction on an image model by taking the convolutional base of the model and adding a new classifier/regressor on top.
Depth-wise-separable convolutions allow you to train smaller models that often have better performance.
The sparsity of activations in a convolutional model increases with the depth of the layer.
In order to see the pattern that a filter corresponds to, you can apply gradient ascent in input space.
Class Activation Maps highlight the discriminative regions in an image that influence the CNN's classification decision. They are generated by mapping the predicted class score back to the feature maps from the last convolutional layer of the network.
Mixed precision training is a technique used to train deep neural networks more efficiently by using different numerical formats for different parts of the training process.
Last updated