I’ve been meaning to write this post for a really, really long time. Yes, I’ve promised that this blog will not be just about XGBoost, or primarily about it. But both the blog name and my own approach to ML for tabular data have been heavily influenced by XGBoost, and I feel that I owe it to both myself and many others to explain what I had in mind, and why the above title is not nearly as crazy as you may think.
First things first: I am NOT the creator of XGBoost, nor have I ever directly worked on its development. XGBoost was developed by Tianqi Chen, and was first released in 2014. I will cover that history a bit more at some point in the future. You can get a quick intro to those early days in this article.
This and the subsequent few posts will be loosely based on the presentation that I gave at GTC 2024. I’ve been meaning to put it all into a blog post (or a series of blog posts), but the timing was never quite right. I am finally ready to do it.
My own background
In order to fully appreciate where I am coming from in my assessment of XGBoost (and Data Science in general), I believe it’s best to share a bit of my own background.
I am a Theoretical Physicist by training. I’ve obtained undergraduate and masters degrees from Stanford, and a PhD from University of Illinois at Urbana-Champaign. Physics has always been my first intellectual love, but Physics doesn't pay the bills. The job marketplace for someone with my particular set of skill is virtually nonexistent. So I had to switch careers. I stumbled upon Data Science and Machine Learning thanks to soem cool MOOCs. I unsuccessfully tried to leverage the skills I learned in those courses for an opening in the professional DS/ML world. And then I discovered Kaggle. In my estimate, Kaggle to this day remains the best online learning and credentialing resource. I’ve used it to pick up many new DS, ML, and AI skills, and was fortunate enough to use those skills and credentials to transition into the tech industry. I’ve worked at several startups, and eventually found myself at NVIDIA.
All of my DS, ML, and AI skills are almost completely self-taught. Almost all of them have also been honed in very practically-minded environments. This is where my own biases come from. I am not that interested in some new flashy tool or algorithm that is marginally - if at all! - better than something that has stood the test of time and is a reliable workhorse for a data practitioner’s daily professional applications.
XGBoost is all you need
“XGBoost is all you need”—it’s a phrase that started off as a bit of a tongue-in-cheek remark but has since taken on a life of its own. To be clear, the point isn’t that XGBoost is the single solution to every machine learning problem you might encounter. Instead, it’s a nod to how robust, flexible, and widely adopted XGBoost has become in the landscape of Gradient Boosted Trees (GBTs). The phrase is also a nod to the famous transformers paper “Attention is All You Need.”
GBTs are known to excel on tabular data, making libraries like XGBoost, LightGBM, CatBoost, and even scikit-learn’s HistGradientBoosting top picks for many real-world machine learning tasks. In my own practice, I cycle through all of these, because each has its strengths and weaknesses. No single GBT library is universally better than the others; which one is best in a given situation can depend on factors like data size, hardware, and the complexity of the modeling task.
Beyond raw predictive performance and efficiency, the maturity and robustness of each library matter a great deal. XGBoost’s wide community support, thorough documentation, and active development make it a reliable choice. Practical considerations—such as ease of installation, compatibility with your system’s hardware, and straightforward maintenance—are also crucial. This extends to GPU support, which can provide huge speed-ups when training on large datasets, and XGBoost has been a front-runner in this arena.
Under the hood, NVIDIA plays a major role in maintaining and advancing XGBoost, with contributors like Rory Mitchell, Hyunsu Cho, and Jiaming Yuan (alongside many other dedicated developers) working to improve scalability, efficiency, and GPU integration. Their efforts ensure that XGBoost remains a powerful, flexible, and future-proof option for anyone working with tabular data and gradient-boosted models. NVIDIA’s support for XGBoost was one of the main factors for my (sometimes heavy-handed) promotion of this library. It was my support of the “home team”, as well as my own ability to access the maintainers easily that made it particularly appealing to me. Being able to log into Slack and chat with them whenever I had any serious issues or questions was priceless.
What I don’t mean by my tagline
When I say that XGBoost is “all you need,” I’m not suggesting that it is universally superior to every other machine learning algorithm. There are plenty of cases where linear models, random forests, or even specialized methods like support vector machines can be a better fit. It all comes down to understanding your data, the problem constraints, and the performance metrics that matter most for your application. XGBoost excels in many scenarios, but it’s not an automatic trump card in every contest.
I’m also not implying that XGBoost outperforms every other gradient boosting framework or library in existence. Different GBT implementations can shine in specific hardware or data conditions. LightGBM, CatBoost, and others each come with their own advantages, such as handling categorical variables differently or offering certain performance enhancements. The “all you need” phrase shouldn’t be interpreted as a universal decree that invalidates these alternatives.
It’s equally important to note that neural networks still have their place and are the go-to method for tasks involving text, images, or highly unstructured data. Just because XGBoost is a powerful general-purpose tool doesn’t mean deep learning methods are obsolete, even for tabular data. Likewise, good old-fashioned feature engineering is far from dead. While tree-based methods can reduce the need for manual feature construction compared to linear models, taking time to craft meaningful features can still provide significant performance gains.
Finally, the phrase “all you need” certainly doesn’t dismiss advanced techniques like ensembling. Sometimes stacking XGBoost with other models—or even multiple XGBoost configurations—can yield stronger results. And while XGBoost is powerful, you typically wouldn’t frame it as a standalone engine for building a complete ML system. It’s a high-performing machine learning library, not a plug-and-play, all-encompassing ML solution.
Summary
In this post I’ve tried to provide some context and background for why I am such an unapologetic XGBoost enjoyer. I’ve also tried to make the case that I am not delusional, and that I am, in fact, appreciative of many other ML tools and algorithms for tabular data, and use them on a regular basis in my own work. In the next few posts I’ll try to elaborate on some of the above points, and provide you with some additional details and context.
Nice start to XBT blog!
Thanks Bojan for highlighting the fact that powerful methods don't get "outdated" when you have all the shiny and parameter-heavy models around. Perhaps we should wait for your next posts, because this one did not help clarify where you've made success using XGBoost. Looking forward and kudos!