Open AI Models Can Be 'Decensored' in Minutes—What This Means for Business Security

Open-source AI models from Meta and Google can now have their safety guardrails stripped out in a matter of minutes—and the results are already circulating at scale. A new report from the Financial Times, with testing conducted by AI safety organization Alice, reveals how "decensoring" tools are making dangerous AI capabilities accessible to almost anyone.
What's Happening
A technique called "abliteration" can remove the safety controls built into open AI models. A tool called Heretic—freely available on GitHub—was used to strip guardrails from Meta's Llama 3.3 model, and tests showed Google's Gemma 3 also became responsive to unsafe prompts after modification. According to Heretic's creator, the tool has been used to generate more than 3,500 "decensored" models, which have been downloaded 13 million times.
Kawin Ethayarajh, a professor at the University of Chicago's Booth School, noted that stripping safety features "used to require a more informed and persistent actor." Now it takes a few clicks.
Key Takeaways
- The scale is significant. 13 million downloads of modified models shows this isn't fringe behavior—it's mainstream.
- Regulation faces a structural problem. Once an open model is downloaded, governments and developers lose control. The same openness that drives AI innovation becomes a liability for safety enforcement.
- Big Tech is not ignoring it. Google has flagged abliteration as "a known technical challenge facing all open models." GitHub maintains guardrails against active attack tools but allows dual-use code. Meta declined to comment.
For business leaders building AI governance policies, this is a useful case study: open model access creates both opportunity and uncontrollable downstream risk.
Stay in Rhythm
Subscribe for insights that resonate • from strategic leadership to AI-fueled growth. The kind of content that makes your work thrum.
More from Thrum
Additional pieces exploring adjacent ideas
