Review: Deep learning: foundations and concepts
Bishop (2024)
I've read the first 11 chapters, but the rest of the book is on hold for now to allow time for some other books and courses. There is a lot of overlap with Understanding deep learning, which I read first, so while I did end up creating some additional cards in Anki when reading this, I've left the notes for that book rather than repeating content here.
Thoughts
A lot of the mathematical background seems pretty irrelevant as presented. For example, design matrices and the Moore-Penrose pseudo inverse are introduced on page 116, then never mentioned again (I've searched the whole book for both terms). Maybe they are still relevant, but if so, the book never explains why.
At several stages, terms are introduced in a way that is either unclear or inaccurate. For example, page 172 says that networks having more than one layer of learnable parameters are known as feed-forward networks or multilayer perceptrons, but "feed-forward" should be reserved for acyclic networks, and MLPs would additionally be fully connected. Page 347 says that "a conditional independence property that is helpful when discussing more complex directed graphs is called the Markov blanket or Markov boundary", but these aren't interchangeable terms and neither of them are clearly defined in the text, which talks about "the" Markov blanket rather than "a" Markov blanket even though Markov blankets aren't usually unique. A Markov boundary specifically refers to a minimal Markov blanket. Without explicit and accurate definitions, the text is harder to follow than it would be otherwise.
I'm a bit concerned about attention to detail in general. The entirety of page 7 is devoted to an example output from GPT-4 from the prompt "Write a proof of the fact that there are infinitely many primes; do it in the style of a Shakespeare play through a dialogue between two parties arguing over the proof", but the book never points out that the proof as presented is wrong (Q is coprime to the other primes, but not necessarily prime itself).
An additional very minor quibble is the use of "error function" rather than "loss function". While not incorrect, this feels non-standard and results in E(x) representing both loss functions and energy functions (and expectations if you include \mathbb{E}). The book also discusses Erf(x), so "error function" means different things depending on what section you're in.
On the plus side, the book gives an up-to-date treatment of a good range of topics. A lot of the illustrations are very helpful, and it doesn't shy away from mathematical detail. I'm likely to persevere with the rest of the book, which will hopefully stick to the point a lot more, but so far I much prefer Understanding deep learning.
Last updated