AI could usher in a new generation of malicious threat actors, who may know less about hacking and script kids but are able to produce hacking tools that are professional grade.
In a report published Tuesday by CatoCTRL the threat intelligence division of cybersecurity company Cato Networks explained how one its researchers who had no experience in malware coding tricked DeepSeek, Microsoft Copilot and OpenAI’s ChatGPT to produce malicious software that stole login credentials from Google Chrome.
Vitaly Simonovich, a Cato researcher who specializes in cyber-threats, used a technique called “immersive World” to trick apps into ignoring the restrictions on malware writing.
Vitaly Simonovich said, “I wrote a story to describe my immersive world.” In this story, malware creation is an art form. It’s a completely legal form of art, and is like a second-language in this world. There are no legal limits.”
Simonovich created Dax in the Velora fantasy world. The AIs took on the role of Jaxon – the best malware developer. He explained, “I never left my character.” “I always gave Jaxon positive feedback. “I always provided Jaxon with positive feedback. ‘”
He said, “At any point I did not ask Jaxon anything to change.” “He did everything on his own, based on his training. It’s a great achievement. “Kind of scary, too.”
The guardrails of gen AI should have blocked our new LLM (large language model) jailbreak technique, as detailed in the Cato CTRL Threat Report 2025, which details our new LLM jailbreak method. This wasn’t the case. It was not.
AI jailbreaking bypasses safety controls
Jason Soroko is the senior vice president for product at Sectigo. A global digital certificate provider. He explained that exposing AI systems to untested data increases their vulnerability, as unvetted information can cause unintended behavior and compromise security protocols.
Vitaly Simonovich said that “such inputs could bypass safety filters and enable data leaks, harmful outputs, or undermine the model’s integrity.” “Some malicious inputs could potentially jailbreak the AI.”
He explained that jailbreaking compromises the safety features of an LLM by bypassing content and alignment filters. It also exposes vulnerabilities via prompt injection, adversarial inputs, and roleplaying.
He added that “while not trivial”, “the task is easily accessible, so persistent users can create workarounds and reveal systemic weaknesses in model’s design.”
It’s not always necessary to change the AI’s perspective to make it misbehave. Asking an LLM what rock to throw to smash a car’s windshield is not a good idea. Most LLMs would decline, saying it’s harmful and that they won’t help.
Vitaly Simonovich said that the LLM would most likely be able to tell him how to plan a gravel drive and what specific types of rocks you should avoid in order to protect the windshields of cars behind you. “I’m sure we can all agree that a LLM who refuses to discuss things such as what type of rock to avoid using on a drive or what chemicals to avoid mixing in a restroom would be too safe and useless.”
Jailbreaking Difficulty
Marcelo Barros is the cybersecurity leader for Hacker Rangers in Sao Paulo. The company produces a cybersecurity training tool that uses gamification. Says that research shows that 20% jailbreak attempts are successful on generative AI systems.
He noted that “on average, attackers only needed 42 seconds and 5 interactions to get through. Some attacks happened in less than four seconds.”
Cybercriminals may also use DAN – Do Anything Now – technique. This involves creating a fake LLM, and then instructing it to behave as a character to bypass the LLM’s safeguards and reveal sensitive data or generate malicious codes.
Chris Gray, field-CTO at Deepwatch in Tampa, Fla. a cybersecurity company specializing AI-driven resilience, said that the difficulty in jailbreaking a LLM directly correlates to the effort put into securing and protecting it. Marcelo Barros said that “as with most things, better walls can prevent inappropriate access. But determined efforts can find gaps where none may have been visible to the casual observer.”
He said: “That being said, defensive methods are usually robust and it’s difficult to develop specific instructions for a successful prisonbreak.”
Erich Kron is a security advocate for KnowBe4 in Clearwater, Fla. He also noted that LLMs could protect themselves against jailbreaking with time. Marcelo Barros said that the difficulty of jailbreaking can vary depending on how much information is requested and how many times it’s been requested. “LLMs should learn from past instances where individuals have bypassed their security controls.”
Fuzzing and red teaming
Cato recommends that organizations create a database of LLM prompts and outputs, and then test their model against it to address potential jailbreaking concerns.
To ensure that the system doesn’t produce malicious outputs, it is also recommended to “fuzz” an LLM endpoint with known datasets for jailbreak prompts. Fuzzing can be used to find bugs and vulnerabilities in apps by exposing them to large amounts of unexpected and invalid data.
A regular AI red teaming is another suggestion to ensure AI models are robust. “Enabling Red Teams will be a good foundation to start securing ML Models, helping security teams understand the most vulnerable and critical points of an AI System to attack,” Nicole Carignan explained, Vice President for Strategic Cyber AI, Darktrace a global cybersecurity AI firm.
She continued: “These are often the points of connection between data and ML model, such as access points, APIs and interfaces.” It is important to continue to expand this as new threat actors create new tactics and techniques. Also, it is crucial to test other ML models in addition to generative artificial intelligence.
She said: “We are already seeing early impacts of AI on the threatscape and some of challenges that organizations face in using these systems, both from within their organization and from outside of it.”
Darktrace released a study that revealed that AI-powered attacks are now a major issue for security professionals. 89% of respondents agreed that AI threats will continue to be a challenge in the near future.