Fortifying AI: Addressing the Security Risk of Large Language Models

Jul 30

2 min read

In our recent webinar, we explored the security threats facing applications that use large language models (LLMs). LLMs are a double-edged sword in cybersecurity: while they can strengthen vulnerability detection, testing, and protection mechanisms, they can also be exploited for offensive purposes such as launching or automating attacks.

We highlighted several vulnerabilities, including data poisoning, backdoor attacks, and data extraction. A major focus was on prompt injection and jailbreaking attacks —currently considered the biggest security risk for large language models according to OWASP. The challenge lies in the fact that malicious input can be easily hidden within trusted user input, making it difficult for systems to reliably separate the two.

Jailbreaking attacks aim to bypass safety policies and force models to produce disallowed or harmful content. Tactics such as role-playing, output format manipulation, and persuasive adversarial prompts can all be used to circumvent safeguards.

As agentic systems continue to evolve, we also discussed new security challenges specific to AI agents — especially when they use tools, plan tasks, or interact with external environments, other agents, or long-term memory. These interactions introduce additional attack surfaces that must be secured.

In the second part of the webinar, we focused on defense and mitigation strategies. Security in AI systems operates at multiple levels — from careful prompt design and robust input validation to systemic security measures like permission management and tool verification. Techniques such as sandwich instructions can already improve prompt security. Equally important are safeguards at higher levels, including user validation, permission controls, and proper tool governance.

To illustrate these points, we conducted a live demo. We demonstrated how easy it can be to jailbreak two popular LLMs, prompting them to generate instructions for making a Molotov cocktail and to output discriminatory Python code. We then showed how effective defense mechanisms and best practices can help prevent such misuse and strengthen LLM security in practice.