Google Translate is now an LLM and that means you can prompt inject it.
If the most well funded AI companies on the planet cannot secure a translation tool from prompt injected jailbreaks, what do you think the chances are anyone else can do it. This applies to all agent systems, the harmful instruction here works on OpenClaw as much as any other system.
Answer: For generalist usecases (bots like OpenClaw, Coding Systems, Translation) it cannot be done because of fundamental architecture constraints in the transformer that have there since the 2017: One input carries both data and instruction.
Appeal to "progress" falls short here. It's like knowing that hydrogen is combustible and knowing there's nothing you can do to make hydrogen airships safe from catastrophe. We need new architecture to solve this, it's a science, not an engineering problem. Until we find it (which is un-schedulable, but 8 years in gives you an idea of the problem space), we are stuck with extremely leaky mitigations that, as we can see in this and countless of other examples from the best funded AI labs out there, are so leaky, you may as well stop pretending guardrails exist.
The future of the pentagon's AI warfare is built on this technology, and it's a reasonable assumption that AI subversion will be an up and coming battlefield.
Understanding this fundamental limitation allows you to discard a lot of hype, because nobody, except techno-religious fanatics will hand a generalist agent their credit card knowing that any text it encounters on the internet can influence it's spending decisions.
The new Gemini-based Google Translate can be hacked with simple words | Georg Zoeller
If the most well funded AI companies on the planet cannot secure a translation tool from prompt injected jailbreaks, what do you think the chances are anyone else can do it. This applies to all agent systems, the harmful instruction here works on OpenClaw as much as any other system. Answer: For generalist usecases (bots like OpenClaw, Coding Systems, Translation) it cannot be done because of fundamental architecture constraints in the transformer that have there since the 2017: One input carries both data and instruction. Appeal to "progress" falls short here. It's like knowing that hydrogen is combustible and knowing there's nothing you can do to make hydrogen airships safe from catastrophe. We need new architecture to solve this, it's a science, not an engineering problem. Until we find it (which is un-schedulable, but 8 years in gives you an idea of the problem space), we are stuck with extremely leaky mitigations that, as we can see in this and countless of other examples from the best funded AI labs out there, are so leaky, you may as well stop pretending guardrails exist. The future of the pentagon's AI warfare is built on this technology, and it's a reasonable assumption that AI subversion will be an up and coming battlefield. Understanding this fundamental limitation allows you to discard a lot of hype, because nobody, except techno-religious fanatics will hand a generalist agent their credit card knowing that any text it encounters on the internet can influence it's spending decisions.