I’ve been meaning to start this blog for a really long time. Before you ask, and I know this is the first thing everyone is thinking, no this is NOT a blog about XGBoost. Oh, XGBoost will be featured here, and probably very prominently, but it will not be the sole focus of this blog.
So what is this blog about?
In simple terms, this is a technical blog about more advanced aspects of Data Science and Machine Learning, with the primary focus on tabular data. Most of the topics covered here will be considered “trad” ML. There will also be a very heavy emphasis on the applied topics and research, highly relevant for the practitioners in this field.
Over the past 10-15 years neural networks and deep learning have become the backbone of all of the most exciting advancements in Machine Learning: from “simple” image recognition, text classification, and voice recognition, to modern Generative AI and LLMs, these algorithms have gone from strength to strength. They richly deserve all the recognition and attention that has been heaped upon them. Unfortunately, this overwhelming concentration of attention, effort, and resources has come at the expense of all the other Machine Learning techniques and algorithms. All the research and development in those other branches of ML has severely stagnated. It is one of the aims of this blog to rekindle, however modestly, research efforts in those other fields.
Around the same time when Deep Learning was taking off, Data Science became a very trendy moniker for a new professional occupation, dubbed unironically “the sexiest job of the 21st century”. This was also around the time when I was changing careers, and Data Science, on the surface of it, seemed like the perfect new profession for me. It combined analytical and technical skills, in just the right proportion it seemed, commensurate with my own capabilities and interests. I had taken its name at the full face value. For me, to this day, Data Science is about two things: 1. Data, and 2. Science. Fast forward a decade or so, and several jobs across the technology landscape, and I’ve learned the hard way that I was sold a bill of goods. At every single workplace that I had worked at, Data Scientists were expected to be just another form of Software Engineers. The unique insights that they could bring to the table were, in the best case scenarios unappreciated. At worst, the colleagues and the management were actively hostile to them.
There is a question, in the days of ever approaching AGI, if Data Science has a future. It is a good question, a version of which can be applied to almost any white collar profession. If it does have a future, though, then I we owe it to the field to make it live up to its true promise and potential. It is my hope that this blog will also be helpful in that regard - make Data Science what it really should be all about: inquiry, research, insights, understanding. Provide practitioners with the best possible tools and insights so that they can get their job done as well as possible. And maybe, just maybe, gain respect within the hallowed halls of the tech industry.
If it does end up as just a blog about XGBoost I won't be mad!
yes, a BI analyst is far closer to data science than any software engineer