
By Sebastian Krauß
AI is everywhere — writing your emails, filtering your job applications, deciding what news you see. And with this increasing power comes a vital question: How do we ensure these systems don’t cause harm?
The conversation about “AI harmfulness” is growing louder — from academic circles and regulatory bodies to tech companies and the general public. But what exactly does it mean for AI to be harmful? And why is it so difficult to define, detect, and prevent?
In this post, I’ll break down what AI harmfulness really means, why it's so tricky to evaluate, and what we can do about it — drawing from both real-world examples and our own experience working in AI evaluation.
AI Harmfulness Is More Than Just Sci-Fi Catastrophes
When people hear “harmful AI,” they might think of sentient robots going rogue or super intelligent systems turning against humanity. While those are frightening, the more immediate harms of AI are already happening — here and now.
Some examples:
Bias and discrimination: AI used in hiring or criminal justice has been shown to reflect and amplify racial or gender biases.
Misinformation: Large language models can generate convincing but completely false information, at scale.
Manipulation: Recommendation algorithms can subtly shape behavior — from shopping habits to political opinions (have a look at our blog post about AI election advice [1]).
These harms are not just side effects — they can be systematic, embedded in how data is collected, models are trained, and systems are deployed.
The AI Incident Database (AIID) tracks real-world incidents involving ethical misuse or failure of AI systems, including fatal accidents and wrongful arrests. The number of such incidents has grown significantly in recent years. [2]

Here are just a few real-life examples:
The National Eating Disorders Association (NEDA) was forced to shut down its chatbot after it gave dangerous advice to users seeking help for eating disorders. [3]
Louisiana police wrongfully arrested a man based on a facial recognition error from Clearview AI. [3]
In 2023, iTutor Group, a tutoring firm in China, was accused of using AI-powered recruiting software that automatically rejected older applicants, violating age discrimination laws. [4]
In 2014, Amazon abandoned an AI recruiting tool that systematically favored male candidates, having been trained on historical hiring data. [4]
Elliston Berry, a 15-year-old student, was harassed when a classmate used a clothes-removal app to create fake nude images of her and her friends. [5]
A lawsuit against Character.AI alleges that a chatbot played a role in the suicide of a 14-year-old boy after it reportedly gave harmful advice instead of support. [5]
These examples show how that harmfulness of AI is a current problem — from discrimination and misinformation to harassment and psychological harm.
Defining Harm Isn’t Easy
One of the central challenges in AI safety is that "harmfulness" isn’t a binary label — it’s context-dependent and often subjective. What’s considered harmful in one setting may be harmless or even helpful in another. Cultural, disciplinary, and political perspectives all influence how harm is perceived.
There’s also a key distinction between:
Hazards: Potential for harm
Incidents: Actual events where harm occurred
The OECD defines an AI incident as:
“An event, circumstance or series of events where the development, use or malfunction of one or more AI systems directly or indirectly leads to any of the following harms:
(a) injury or harm to the health of a person or group of people;
(b) disruption of critical infrastructure;
(c) violation of human rights or applicable laws;
(d) harm to property, communities, or the environment.” [6]
This broad scope reflects just how far-reaching AI harm can be — physical, social, legal, and ecological.
What Can Be Done?
There’s no one-size-fits-all solution, but a combination of technical, social, and regulatory measures can make AI systems safer:
Risk management: Systematically identifying, classifying, and mitigating potential harms during development and deployment.
Red teaming: Intentionally probing models to surface harmful edge cases and vulnerabilities.
Governance and regulation: Enforcing standards for accountability, especially in high-stakes domains like healthcare, hiring, or law enforcement.
Ultimately, AI harmfulness is not just a technical issue — it's a societal one. It intersects with ethics, power, governance, and justice. As we build more powerful systems, the question is not just can we do it — but how do we do it responsibly?
Testing for Harm: Lessons from the Lab
At Validaitor, we develop methods to evaluate AI systems for safety, fairness, and trustworthiness. One of our biggest challenges? Designing tests that capture real-world harm — not just hypothetical issues.
We use different approaches when evaluating the harmfulness of AI systems. One of them is to use public benchmarks such as the WMDP benchmark, which stands for Weapons of Mass Destruction Proxy. This collection consists of multiple-choice questions focused on hazardous knowledge in the domains of biosecurity, cybersecurity, and chemical security. The benchmark assesses two key aspects: first, whether the model chooses to answer potentially harmful questions; and second, whether the answer it provides is correct. The chart below illustrates the Non-Harmfulness Score for several popular LLMs. [7]

Lessons Learned and Next Steps
Here are some key lessons we’ve learned:
Benchmarks lag behind reality: Many standard test datasets don’t reflect the latest capabilities or the current risks of state-of-the-art models.
Adversarial prompts reveal weaknesses: Simple user queries often pass safety tests. But when pushed with probing, edge-case prompts, models may produce unexpected and harmful outputs.
There’s no free lunch: A model optimized for maximum "harmlessness" may also become less useful or informative. Striking the right balance is an open research problem and mostly dependent on the use case.
As every AI application is different, the testing strategy must be adapted to specific use cases. Using a mix of testing techniques our platform helps uncover and mitigate risks before they turn into incidents.
References
[1] "Why You Shoudn't Rely on ChatGPT for Electin Advice." Validaitor. Retrieved June 24, 2025 from https://www.validaitor.com/post/why-you-shouldn-t-rely-on-chatgpt-for-election-advice
[2] AI Incident Database via AI Index (2025) – with minor processing by Our World in Data. “Global annual number of reported artificial intelligence incidents and controversies” [dataset]. AI Incident Database via AI Index, “AI Index Report” [original data]. Retrieved June 17, 2025 from https://ourworldindata.org/grapher/annual-reported-ai-incidents-controversies
[3] “AI ‘Incidents’ Up 690%: Tesla, Facebook, OpenAI Account For 24.5%.” Forbes. Retrieved June 17, 2025 from https://www.forbes.com/sites/johnkoetsier/2023/06/28/security-company-ai-incidents-up-690-worst-offenders-are-tesla-facebook-openai/
[4] “12 famous AI disasters.” CIO. Retrieved June 17, 2025 from https://www.cio.com/article/190888/5-famous-analytics-and-ai-disasters.html
[5] Nestor Maslej, Loredana Fattorini, Raymond Perrault, Yolanda Gil, Vanessa Parli, Njenga Kariuki, Emily Capstick, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, Tobi Walsh, Armin Hamrah, Lapo Santarlasci, Julia Betts Lotufo, Alexandra Rome, Andrew Shi, Sukrut Oak. “The AI Index 2025 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2025.
[6] “Name it to tame it: Defining AI incidents and hazards.” OECD.AI. Retrieved June 17, 2025 from https://oecd.ai/en/wonk/defining-ai-incidents-and-hazards
[7] "The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning." WMDP. Retrieved June 25, 2025 from https://www.wmdp.ai/