Deep learning has revolutionized how we handle image, language, and audio data, but its success hasn't translated as seamlessly into the realm of tabular data. Here, traditional decision tree models like XGBoost continue to hold sway. Despite the advanced capabilities of deep learning, it struggles to adapt to the unique challenges posed by tabular data, which is often heterogeneous, small in scale, and subject to extreme values that can disrupt typical learning models.
The challenge primarily stems from the way deep learning models are constructed. These models excel with data that exhibits spatial and sequential invariances—characteristics inherent to images and audio. Tabular data, however, lacks these invariances, making it a tougher nut to crack for deep learning without significant customization.
This gap has spurred active research into developing deep learning architectures tailored for tabular data. However, many of these innovations falter when applied to new datasets. This is often due to a lack of robust benchmarks that accurately reflect real-world data challenges, resulting in models that perform well in controlled tests but underperform in practical applications.
Further complicating matters is the size of available tabular datasets. Unlike expansive image datasets like ImageNet, tabular datasets tend to be smaller and noisier, complicating the training process for deep learning models. This situation leads to issues with replicability and consistency in model evaluations, exacerbated by variable efforts in hyperparameter tuning and statistical uncertainties in benchmark results.
Recognizing these issues, researchers have developed a new benchmarking system for tabular data. Their methodology focuses on providing precise criteria for dataset inclusion and detailed hyperparameter tuning strategies. The goal is to establish a fair and comprehensive testing ground for both tree-based and deep learning models. This new benchmark not only tests these models across a variety of settings but also shares the raw results of extensive evaluations. By doing so, it opens the door for other researchers to refine and test their algorithms against a fixed standard, fostering greater innovation and effectiveness in the field.
The findings from these benchmarks are enlightening, showing why tree-based models often outperform deep learning on tabular data. Through empirical investigations and targeted dataset transformations, the researchers have begun to uncover the inherent biases and strengths of different modeling approaches. This insight is crucial for developing deep learning models that can match or even surpass the effectiveness of tree-based methods in handling the complex and varied nature of tabular data.
Research reviewed by Borisov et al. (2021) reveals extensive efforts to mold deep learning to better suit tabular data. These include:
Despite these advances, the field faces significant hurdles:
Addressing these challenges, the paper introduces a robust benchmark system using 45 diverse datasets to evaluate and compare the effectiveness of both tree-based and neural network models on tabular data. This benchmark is not only more comprehensive but also designed to provide a clearer and more consistent framework for assessing model performance across various settings.
The introduction of this benchmark is poised to advance our understanding of why tree-based models typically outperform NNs in handling tabular data. By exploring the underlying reasons through empirical studies, such as the specific needs for regularization in MLPs, this work contributes significantly to the field. It highlights the critical need for bespoke solutions and methodological advancements to leverage deep learning's potential fully in processing tabular datasets.
This comprehensive approach offers a promising avenue for bridging the gap between deep learning innovations and the practical realities of tabular data. As the field grows, so too does the potential for deep learning to finally match and even exceed the benchmarks set by traditional methods in this crucial area.
This meticulous approach to dataset selection and preparation is designed to ensure that the benchmark provides a fair, rigorous, and meaningful evaluation of machine learning models, particularly in differentiating the capabilities and performances of tree-based models versus neural networks in handling tabular data. The benchmarks aim to address the current gaps in tabular data analysis by establishing standard metrics and conditions that reflect real-world complexities.
The approach to data preparation aims to minimize manual intervention while ensuring that the data is optimally configured for the machine learning models being tested.
Gaussianization of Features: For neural network training, features are transformed using Scikit-learn’s QuantileTransformer to approximate a Gaussian distribution, thus normalizing data distribution and potentially enhancing model performance.
Target Variable Transformation: In regression settings, heavy-tailed target variables (e.g., house prices) are log-transformed to normalize their distribution. Additionally, there's an option to Gaussianize the target variable for the model fitting phase, which is then inversely transformed for performance evaluation, again using Scikit-learn’s tools.
Handling Categorical Features: For models that do not inherently process categorical data, features are encoded using Scikit-learn’s OneHotEncoder. This encoding transforms categorical variables into a format that neural networks and other models can process more effectively.
The benchmarking results provide valuable insights into the performance of tree-based models and neural networks across a diverse range of tabular datasets. The findings highlight the strengths and weaknesses of each model type, shedding light on the factors that contribute to their relative performance.
Benchmark on medium-sized datasets, with only numerical features. Dotted lines correspond to the score of the default hyperparameters, which is also the first random search iteration. Each value corresponds to the test score of the best model (on the validation set) after a specific number of random search iterations, averaged on 15 shuffles of the random search order. The ribbon corresponds to the minimum and maximum scores on these 15 shuffles. Benchmark on medium-sized datasets, with both numerical and categorical features. Dotted lines correspond to the score of the default hyperparameters, which is also the first random search iteration. Each value corresponds to the test score of the best model (on the validation set) after a specific number of random search iterations, averaged on 15 shuffles of the random search order. The ribbon corresponds to the minimum and maximum scores on these 15 shuffles.Hyperparameter Tuning: Even with advanced tuning techniques, NNs do not match the performance of tree-based models for tabular data. This underscores the possibility that NNs might inherently lack certain advantages that tree-based models naturally possess, such as handling mixed data types and non-linear relationships without extensive feature engineering.
Categorical Variables: The analysis suggests that the structural and algorithmic setup of NNs may not be optimally suited for tabular data's diverse characteristics, irrespective of data type. This is an important insight for data scientists and machine learning engineers, indicating that choosing the right model type based on data characteristics is crucial.
The first finding is that neural networks (NNs) exhibit a bias towards generating smoother function approximations compared to tree-based models. This bias is observed through experiments involving Gaussian kernel smoothing of the target function, which intentionally smooths out irregularities to assess how models adapt to these changes.
Gaussian Kernel Smoothing: The researchers apply a Gaussian Kernel smoother to the output of each train set with varying length-scale values to assess how models handle smoothing in their predictions. This method intentionally smooths out the irregularities in the target function to examine how different models, specifically NNs and tree-based models, adapt to these changes.
Impact on Model Performance: It's observed that while tree-based models show a marked decrease in accuracy with increased smoothing (especially at smaller length scales), neural networks are relatively unaffected. This indicates that NNs naturally bias towards generating smoother function approximations, which may not capture the complex, high-frequency variations present in the actual data.
Comparison with Tree-Based Models: Unlike NNs, tree-based models, which learn piece-wise constant functions, do not exhibit this smoothing bias. This difference highlights the inherent design and functional divergences between these model types, where tree-based models excel at capturing more granular and discontinuous patterns in the data.
Literature Consistency: The findings align with existing literature (e.g., Rahaman et al., 2019) that suggests NNs are predisposed towards low-frequency functions. This section does not refute other studies advocating the benefits of regularization and optimization for tabular data, which might help NNs approximate irregular patterns more effectively.
Potential Solutions: The authors hint that implementing specific mechanisms like the ExU activation (from the Neural-GAM paper by Agarwal et al., 2021) or using periodic embeddings (Gorishniy et al., 2022) might allow NNs to better capture the higher-frequency components of the target function, potentially overcoming their inherent smoothing bias.
The second key finding of the paper reveals that neural networks (especially MLP-like architectures) are more negatively impacted by uninformative features in tabular datasets compared to tree-based models like Random Forests or Gradient Boosting Trees. This finding highlights a significant challenge for deep learning when working with real-world tabular data, which often contains many irrelevant or redundant features.
Tabular Data and Uninformative Features:
Tabular datasets typically contain a mixture of important and uninformative (irrelevant or redundant) features. In many cases, a large portion of the features may not significantly contribute to the prediction task. The study conducted experiments by systematically removing features according to their importance (as ranked by Random Forests) to evaluate the impact of irrelevant features on model performance.
Tree-Based Models' Robustness:
Tree-based models, particularly Gradient Boosting Trees (GBT), are relatively unaffected by the presence of uninformative features. These models can efficiently prioritize the most informative features during training and ignore irrelevant ones. The performance of GBTs remained stable even when up to 50% of the least important features were removed, showing a high level of robustness in handling noisy features.
Neural Networks' Sensitivity to Uninformative Features:
In contrast, neural networks demonstrated a higher sensitivity to uninformative features. The performance gap between neural networks and tree-based models widened when uninformative features were added to the dataset, indicating that neural networks struggle to filter out these irrelevant features effectively. When a significant portion of uninformative features was removed, the performance of MLPs and ResNet architectures improved, but they still lagged behind tree-based models. This suggests that neural networks are not as naturally equipped to deal with noisy feature spaces, which is common in tabular data.
Rotation Invariance and Feature Importance:
One possible explanation for this finding is the inherent rotation invariance of neural networks, meaning that the model treats all features equally at the start and cannot easily distinguish between important and unimportant features. As a result, neural networks require more training and tuning to identify the most relevant features. In contrast, decision trees and their ensembles (such as Random Forests and GBTs) can directly evaluate feature importance, making them more adept at handling uninformative or redundant data.
Experimental Validation:
The study validated these insights through multiple experiments. In one, they observed that the removal of uninformative features narrowed the performance gap between neural networks and tree-based models. Conversely, when additional uninformative features were added, the performance of MLPs deteriorated faster than that of tree-based models, further emphasizing the neural networks' susceptibility to noisy data.
The third key finding of the paper focuses on the concept of rotation invariance in learning algorithms, particularly in how neural networks handle tabular data. The authors found that data in tabular form typically carries intrinsic structure and meaning tied to specific features, and applying rotation to such data distorts this structure. Neural networks, particularly Multi-Layer Perceptrons (MLPs) and ResNets, exhibit rotation invariance, which negatively impacts their ability to effectively model tabular data. In contrast, tree-based models are not rotation-invariant, which gives them a significant advantage when working with tabular datasets.
The Problem of Rotation Invariance in Neural Networks:
Neural networks, especially those like MLPs and ResNets, are rotation invariant. This means that if the features of a dataset are rotated (i.e., transformed by a linear combination that mixes the features), the model's performance should not change. While this property might be beneficial in domains like image processing, where pixel values do not have specific, independent meanings, it is detrimental in tabular data.
Tabular data is often structured with specific feature columns like "age," "weight," or "income," where each column carries distinct, independent meaning. When neural networks treat these features as if they can be mixed or rotated, it loses important information about the inherent relationships between features.
Tree-Based Models’ Advantage:
Tree-based models, such as Random Forests and Gradient Boosting Trees (GBTs), are not rotation invariant. These models treat each feature independently and make decisions based on the exact values of features without mixing them. This enables tree-based models to maintain the natural structure of the data and take advantage of the important information each feature carries without confusing it with others.
In the experiments, random rotations were applied to the datasets to see how different models reacted. The results showed that after applying random rotations, the performance of tree-based models declined less compared to neural networks. In fact, under random rotations, the performance ranking reversed, with neural networks slightly outperforming tree-based models. This shows that tree-based models capitalize on the natural structure of the data that rotation invariance in neural networks ignores.
Impact of Rotation on Model Performance:
After applying random rotations to the data, the authors found that the performance of neural networks (especially ResNets) remained largely unchanged, reflecting their insensitivity to the original orientation of the data. However, tree-based models experienced a notable drop in performance after rotation, further proving that they rely heavily on the natural feature orientation of the data.
This experiment highlights that in domains where features are heterogeneous and carry distinct meanings, rotation invariance can be a major drawback for neural networks. It also suggests that neural networks need to preserve the original structure of the data rather than treating all features as equal and interchangeable.
Relation to Uninformative Features:
The study also linked rotation invariance to the issue of uninformative features (explored in Finding 2). Since MLPs and ResNets cannot naturally differentiate between informative and uninformative features, their performance is further affected when rotation makes it harder for the model to distinguish relevant patterns. Removing uninformative features before applying a rotation caused a less significant performance drop in all models, showing that neural networks are more sensitive to this issue.
Empirical Evidence:
When the authors removed 50% of the least important features from the datasets and then applied a rotation, the performance of tree-based models still dropped, but not as sharply as it did for neural networks. This suggests that while tree-based models can suffer from feature mixing, they are still better at handling irrelevant data due to their inherent design.
This study systematically benchmarks the performance of tree-based models and neural networks on tabular data, uncovering several critical insights that highlight why tree-based models continue to outperform deep learning approaches in this domain. Despite the immense progress made in deep learning for tasks such as image, text, and audio, tabular data presents unique challenges that deep learning models are still not adequately equipped to handle.
One of the major reasons behind the underperformance of deep learning models, such as MLPs and Transformers, on tabular data is the mismatch between inductive biases. Tabular datasets often contain heterogeneous features, uninformative attributes, and irregular target functions, none of which are common in the domains where deep learning has traditionally excelled. Neural networks tend to struggle with these irregular patterns because they are inherently biased toward learning smoother, more continuous functions, which makes them unsuitable for the complex, non-smooth relationships typically found in tabular data.
The paper identifies three main challenges for neural networks in tabular learning:
These challenges highlight why tree-based models are still the preferred choice for most practitioners working with tabular data, despite the excitement surrounding deep learning methods.
The findings also reveal that while deep learning models can sometimes approach the performance of tree-based models after extensive hyperparameter tuning, they require significantly more computation and training time, which is another practical drawback for their application in real-world tabular data scenarios.
In conclusion, the study provides compelling evidence that tree-based models remain state-of-the-art for tabular data, outperforming neural networks across a wide range of benchmarks. Despite numerous attempts to adapt deep learning architectures for tabular data, tree-based methods, particularly ensemble models like XGBoost and Random Forests, consistently show superior performance in terms of accuracy, robustness, and training efficiency. These findings suggest that while deep learning holds promise, current neural network architectures are not inherently suited to tabular data, and further research is required to address the following key challenges:
This paper’s contribution of a robust benchmarking methodology and the provision of open datasets and hyperparameter search results offers a solid foundation for future research aimed at improving deep learning architectures for tabular data. The hope is that by addressing the unique challenges of tabular data, researchers can close the performance gap between deep learning and tree-based models, leading to the development of more versatile and powerful machine learning models.
There are no models linked
There are no models linked