Adversarial attacks and robustness for tabular data

Apr 29

4 min read

By Joseph Nagel • 29/04/2025

Over the last years, deep learning has become the predominant approach to learning predictive models from data. But as it turns out, neural networks are not safe. They are surprisingly fragile in situations where an adversary attempts to fool them into making false predictions. In this blog post, we discuss so-called adversarial attacks in the context of tabular data.

While most of the research in this area focuses on image classification problems, such attacks are of utmost relevance for tabular data as well. They allow for a systematic analysis of potential security vulnerabilities and provide the basis for a quantitative robustness assessment. This eventually supports the development of safe and trustworthy AI.

Adversarial attacks

Ten years ago, researchers identified adversarial attacks as an unexpected failure mode of neural networks [Szegedy et al., 2014]. Imperceptible perturbations of the input data can be often constructed in such a way that they fool the model into making incorrect predictions. This realization has spawned the field of adversarial machine learning that investigates such vulnerabilities together with possible defense mechanisms.

Just to give a visual example, there are two images of a volcano below. The original image on the left is correctly identified as a volcano by an image classification model (with 79% confidence). The image on the right has been adversarially attacked such that the introduced perturbations are too small to be visually detected (maximally 1% of the standard deviation for each pixel). Yet the model misclassifies the image as a goldfish (even with a higher confidence of 97%).

Two side-by-side images of a volcano: the left image is correctly classified as a volcano with 79% confidence, while the right image, visually identical but subtly altered with imperceptible perturbations, is misclassified by an AI model as a goldfish with 97% confidence. — Adversarial attack volcano example

Tabular data

One should note that most of the research on adversarial attacks is based on computer vision applications. In the context of image data one can argue that "small enough" pixel modifications do not alter the semantics of an image. The imperceptibly perturbed image above clearly still shows a volcano. A robust classification model should therefore make the same prediction on the original and the modified image.

The notion of an "imperceptible" perturbation does not translate from the image domain to tabular data directly. Take for instance the following small excerpt from the classical Titanic dataset for survival prediction:

age	fare	class	survived
49.0	56.93	first	1
18.0	8.05	third	1
31.0	26.25	second	0

It is not directly clear how a row of such a structured dataset could be modified in an imperceptible manner. Another question that immediately arises is whether there are at all manipulations that should not change the outcome whether or not a passenger of the Titanic survived? After all, there seems to be a lot of randomness and chance involved. How should different scalings of the numerical features be handled? What about correlations or more complex dependencies between them? Moreover, how should categorical variables be treated?

All in all, in comparison to pixel-level image data, tabular data may be more heterogeneous and involve complex feature relationships. While robustness and security considerations are important for all types of ML models and data modalities, the abovementioned issues complicate the analysis especially for tabular data. They take sparsity and other feature constraints into account in order to establish meaningful adversarial attacks in the context of tabular data.

Example

Let us have a look at a small example based on the well-known forest cover type dataset for multi-class classification. The task is to predict the cover type of small forest patches given certain cartographic information such as elevation, slope and distance to the nearest water body. We will demonstrate how adversarial robustness establishes a relevant criterion for assessing a model's performance in addition to the classical metrics.

Only a subset of the available features is selected for conducting the experiment. Eventually the dataset consists of ten continuous and four categorical variables. A total number of ca. 58.000 data points is split into train, validation and test set. Multiple classifiers are then trained, including logistic regression, an SVM and various feed-forward neural networks. A naturally emerging question now is how to compare the different models and possibly select the "best" one. Often one relies only on a single metric such as the classification accuracy for these purposes. For three of the trained models, the validation accuracy is summarized in the following table:

	accuracy
model 1	70%
model 2	86%
model 3	89%

At this point, adversarial robustness opens a complementary perspective on the model performances. To gain additional insights, several attacks can be run. Each run has a different attack strength or budget ε (in technical terms it is the radius of an Lp-ball that constraints the maximal perturbation). The attack success rate (ASR) quantifies the fraction of attacks that are successful in perturbing a correctly classified data point such that the modified point is incorrectly classified. This way the ASR establishes a measure of a model's adversarial vulnerability or the lack of robustness. Below the strength-dependent ASR is plotted for the same three models:

The strength-dependent ASR is plotted for the same three models. The attack success rate (ASR) quantifies the fraction of attacks that are successful in perturbing a correctly classified data point such that the modified point is incorrectly classified.

As we can see, model 1 has the lowest ASR across the full range of tested strengths. It is the most robust with respect to the selected attacks. At the same time, it has the smallest accuracy score. In comparison, model 2 has a higher accuracy but also a higher vulnerability. This represents a commonly encountered tradeoff between accuracy and robustness. Very often one needs to decide between high accuracy and robustness. But then again, model 3 is both more accurate and more robust than model 2. It is therefore clearly preferable to the model 2.

The final choice between models 1 and 3 would then need to happen on the basis of further context-dependent requirements or considerations. For instance, for some business-related reason one may want to find the most robust model for which the accuracy is guaranteed to be higher than a certain minimal value. Vice versa, one could try to identify the most accurate model with an ASR that is always below a prescribed threshold. The role of the robustness tests in such situations is to establish the quantitative basis for such model selection decisions.

Conclusion

As a summary, deep learning suffers from the existence of adversarial attacks. One can craft imperceptible (or at least semantically irrelevant) input perturbations that cause false model predictions. This lack of robustness raises severe security concerns. For a real-world deployment, in addition to other relevant criteria such as fairness, it is therefore important to test the robustness of ML models with respect to adversarial attacks. For that purpose, Validaitor is developing an easy-to-use toolbox for the quantitative robustness and security analysis.