OpenAI says prompt injections that can trick AI browsers like ChatGPT Atlas may never be fully ‘solved’—experts say it’s ‘a feature not a bug’ ...Middle East

Fortune - News
OpenAI says prompt injections that can trick AI browsers like ChatGPT Atlas may never be fully ‘solved’—experts say it’s ‘a feature not a bug’

OpenAI has said that some attack methods against AI browsers like ChatGPT Atlas are likely here to stay, raising questions about whether AI agents can ever safely operate across the open web. The main issue is a type of attack called “prompt injection,” where hackers hide malicious instructions in websites, documents, or emails that can trick the AI agent into doing something harmful. For example, an attacker could embed hidden commands in a webpage—perhaps in text that is invisible to the human eye but looks legitimate to an AI—that override a user’s instructions and tell an agent to share a user’s emails, or drain someone’s bank account.Following the launch of OpenAI’s ChatGPT Atlas browser in October, several security researchers demonstrated how a few words hidden in a Google Doc or clipboard link could manipulate the AI agent’s behavior. Brave, an open-source browser company that previously disclosed a flaw in Perplexity’s Comet browser, also published research warning that all AI-powered browsers are vulnerable to attacks like indirect prompt injection.

“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,'” OpenAI wrote in a blog post Monday, adding that “agent mode” in ChatGPT Atlas “expands the security threat surface.”

    OpenAI said that the aim was for users to “be able to trust a ChatGPT agent,” with Chief Information Security Officer Dane Stuckey adding that the way the company hopes to get there is by “investing heavily in automated red teaming, reinforcement learning, and rapid response loops to stay ahead of our adversaries.”

    “We’re optimistic that a proactive, highly responsive rapid response loop can continue to materially reduce real-world risk over time,” the company said.

    Fighting AI with AI

    OpenAI’s approach to the problem is to use an AI-powered attacker of its own—essentially a bot trained through reinforcement learning to act like a hacker seeking ways to sneak malicious instructions to AI agents. The bot can test attacks in simulation, observe how the target AI would respond, then refine its approach and try again repeatedly.

    “Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” OpenAI wrote. “We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”

    However, some cybersecurity experts are skeptical that OpenAI’s approach can address the fundamental problem. 

    “What concerns me is that we’re trying to retrofit one of the most security-sensitive pieces of consumer software with a technology that’s still probabilistic, opaque, and easy to steer in subtle ways,” Charlie Eriksen, a security researcher at Aikido Security, told Fortune.

    “Red-teaming and AI-based vulnerability hunting can catch obvious failures, but they don’t change the underlying dynamic. Until we have much clearer boundaries around what these systems are allowed to do and whose instructions they should listen to, it’s reasonable to be skeptical that the tradeoff makes sense for everyday users right now,” he said. “I think prompt injection will remain a long-term problem … You could even argue that this is a feature, not a bug.”

    A cat-and-mouse game

    Security researchers also previously told Fortune that while a lot of cybersecurity risks were essentially a continuous cat-and-mouse game, the deep access that AI agents need—such as users’ passwords and permission to take actions on a user’s behalf—posed such a vulnerable threat opportunity it was unclear if their advantages were worth the risk. 

    George Chalhoub, assistant professor at UCL Interaction Centre, said that the risk is severe because prompt injection “collapses the boundary between the data and the instructions,” potentially turning an AI agent “from a helpful tool to a potential attack vector against the user” that could extract emails, steal personal data, or access passwords.

    “That’s what makes AI browsers fundamentally risky,” Eriksen said. “We’re delegating authority to a system that wasn’t designed with strong isolation or a clear permission model. Traditional browsers treat the web as untrusted by default. Agentic browsers blur that line by allowing content to shape behavior, not just be displayed.”

    OpenAI recommends users give agents specific instructions rather than providing broad access with vague directions like “take whatever action is needed.” The company also said Atlas is trained to get user confirmation before sending messages or making payments.

    “Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” OpenAI said in the blogpost.

    This story was originally featured on Fortune.com

    Hence then, the article about openai says prompt injections that can trick ai browsers like chatgpt atlas may never be fully solved experts say it s a feature not a bug was published today ( ) and is available on Fortune ( Middle East ) The editorial team at PressBee has edited and verified it, and it may have been modified, fully republished, or quoted. You can read and follow the updates of this news or article from its original source.

    Read More Details
    Finally We wish PressBee provided you with enough information of ( OpenAI says prompt injections that can trick AI browsers like ChatGPT Atlas may never be fully ‘solved’—experts say it’s ‘a feature not a bug’ )

    Apple Storegoogle play

    Last updated :

    Also on site :

    Most viewed in News