Deep Learning and its Discontents
A few general musings inspired by a recent tweet
Yesterday I came across a tweet that praised “Deep Learning”, a classic ML/AI textbook.
The author of the tweet credited that textbook for his own career success. I retweeted it, and commented, half tongue in cheek, that “I basically owe my career to ignoring this book.” Now, before we go any further, a disclaimer is in order: I have nothing against the author of the original tweet (I like him and consider him a friend), nor the authors of the above textbook. They are all exceedingly respectable researchers in the field, and have been rightly credited for some of the most important breakthroughs over the past few decades.
I actually started reading “Deep Learning” before it was even officially published - the authors were making the pdf version of the work in progress freely available online. At the time I was still relatively new to the world of Machine Learning, and was trying, on the side, to read some more foundational material. But even then I was already mostly uninterested in applying the same approach to learning the material that I had in my academic career - pouring over intricate and arcane deep knowledge in order to master it after painstaking long study. Now, these days, (as confirmed by the reactions to my tweet) the main issue that most people have with “Deep Learning” is that it’s a purely theoretical book with no applications. My issue, however, with it is that it’s an absolutely terrible textbook. There are no worked out examples in the book of solving even the theoretical problems, and there are no exercises. You are just supposed to raw dog the material and absorb it as you go along. Its pedagogical quality is pretty abysmal.
The second, and rather minor in the grand scheme things, reason why the textbook was not important to me was that at some point I decided that for what I really cared about - predictive modeling and machine learning for tabular data - deep learning was not that helpful and relevant. Right out of the box, neural networks don’t perform as well as the tree-based algorithms. Neural networks can add predictive power, but primarily through ensembling with other models. Furthermore, in those domains non-algorithmic considerations have even more of an impact on your model. That is to say, things like quality of data, feature selection, feature engineering, etc. are far more relevant. I’ve already written about this in my previous posts on this blog, and will probably go into more detail in some subsequent posts.
In general, when it comes to any career in the tech industry, my biggest recommendation for getting good at technical skills is to practice those skills. You will gain much more from being able to solve immediate and relevant problems than you can from pouring over theoretical knowledge. If you still have an itch to dig deeper into the theoretical knowledge later on, then by all means do so! But I would warn you not to do it as a way of “procrastinating” with getting relevant practical knowledge instead.


