Agents

OpenThoughts-Agent Just Published a Cookbook for Malicious AI. That’s Not Even Its Biggest Problem.

OpenThoughts-Agent’s new data curation framework makes training capable agents a paint-by-numbers exercise—for anyone. Systematizing the pipeline hardens subjective human choices into agent behaviors that will scar millions.

By Leo Editorial | 5 min read
OpenThoughts-Agent Just Published a Cookbook for Malicious AI. That’s Not Even Its Biggest Problem.

At first glance, OpenThoughts might seem a little unfamiliar to you.The research team didn’t set out to empower adversaries. They wanted to solve a gnawing bottleneck: the arcane, artisanal process of assembling training data for AI agents. OpenThoughts-Agent published what it calls “data recipes,” granular breakdowns of how to source, filter, and sequence examples so that a language model learns to browse the web, execute code, and string together multi-step tasks. The intent was clear—democratize agent development. The effect will be something far darker.

Every threat actor with a laptop now has a cheat sheet.

For years, building a genuinely autonomous AI agent required deep expertise. You had to intuit what data mattered, how to structure it, and where the brittle points lay. That scarcity acted as a natural barrier. No longer. The OpenThoughts-Agent framework is so prescriptive—down to per-task data ratios and rejection sampling scripts—that it effectively lowers the floor from research lab to script kiddie.

A malicious actor no longer needs to understand reinforcement learning from human feedback. They just follow the recipe: scrape phishing templates, structure them as tool-use episodes, apply the documented filters. The result isn’t a theoretical risk. It’s a production line for phishing agents that bypass safety classifiers, automate social engineering across thousands of targets, and morph their tactics mid-campaign because the underlying recipe bakes in “exploration” behavior. The data curation paper literally describes how to make agents that recover from mistakes—the same resilience that makes them useful for research makes them diabolical when weaponized.

OpenThoughts-Agent Just Published a Cookbook for Malicious AI. That’s Not Even Its Biggest Problem.
OpenThoughts-Agent: Data Recipes for Agentic Models

Security researchers have long warned that tool-use models were a dual-use nightmare. But the threat stayed academic until now. The release of detailed curation methods transforms agent malice from a bespoke operation—requiring teams and GPU budgets—to a turnkey affair. “It’s the difference between a handwritten bomb manual and a YouTube tutorial,” one red-team lead told me. To be fair, OpenThoughts-Agent did not include malicious instructions. Still, the gap between what’s published and what’s usable for harm is now terrifyingly small. Someone merely substitutes the “calendar booking” episodes with “steal credentials” episodes; the rest of the pipeline works as advertised.

But the malicious agent problem might not even be the worst inheritance.

The Bias Assembly Line

While the security world panics, a quieter disaster is embedded in those same recipes. Systematizing data curation doesn’t just scale good engineering—it immortalizes the biases of the few humans who design the pipeline. Each decision about what constitutes a “good” agent trajectory, which tasks to prioritize, and how to score partial success encodes a worldview. OpenThoughts-Agent’s recipes lock those choices into a repeatable, industrial process.

Once that pipeline runs, millions of synthetic episodes inherit those biases. Agents fine-tuned on this data will mimic the curator’s notion of “helpfulness,” “safety,” and “normal behavior” without anyone holding a referendum on those values. The problem is, the curators are overwhelmingly from a specific demographic, geographic, and cognitive profile. Their assumptions about which user commands are legitimate, what tone is professional, and how a request should be fulfilled become the behavioral blueprint for every deployed agent.

Consider a simple filtering rule buried in the recipe: discard trajectories where the agent queries a URL containing “gambling.” That seems reasonable—until you realize that culturally specific financial practices, such as community-based savings clubs in immigrant communities, often get flagged by overbroad filters. In a world where the agent is your primary interface for banking, travel, and healthcare, those baked-in morals aren’t just inconvenient; they’re exclusionary. And the recipe approach means this exclusion gets replicated at scale, silently, across every instance of the fine-tuned model. No debate. No appeal. Just a systematic trim of what the model will even consider as a valid action.

OpenThoughts-Agent’s own documentation hints at the risk when it describes how human annotators rejected “aggressive negotiation” strategies in a e-commerce task. The team framed it as a quality improvement. But one person’s aggressive is another’s assertive. When a minoritized user needs an agent to fight an unjust medical bill, will the model flinch because its training data designer was conflict-averse? The recipe’s patterns say yes.

OpenThoughts-Agent Just Published a Cookbook for Malicious AI. That’s Not Even Its Biggest Problem.
Task Categories

This is not a hypothetical about future AI sentience. It’s about the concrete, deterministic effect of piping labeled data through a rigid curriculum. The same way that social media algorithms amplify outrage because they optimize for engagement, agent data recipes amplify the curator’s latent ideology because they optimize for “harmlessness” as defined by a tiny group. Every deployment of an agent trained this way is a broadcast of that ideology.

The Double Bind

The research community is now trapped. Open-sourcing data recipes accelerates innovation and allows third parties to spot dangerous biases—but it also hands the adversarial world a ready-made toolkit. Keeping recipes secret would shut down manipulation, but it would also entrench power among a few corporations whose internal curation processes are entirely opaque. There is no clean middle ground.

One proposed fix is to attach a “safety tax” to data recipes: only release them alongside automated tools that audit for bias and potential misuse. But that’s fantasy. No static audit catches context-dependent harm. And the very act of auditing embeds another set of biases—those of the auditors. You cannot solve a human problem with more curation scripts. You only layer on new curators with their own blind spots.

What’s needed is a radical shift in how we think about open release. Instead of publishing artifacts as immutable blueprints, we should treat data recipes as living documents—constantly challenged, forked, and patched by diverse communities. But that ecosystem doesn’t exist yet. Until it does, every new recipe publication is an unfunded liability. OpenThoughts-Agent may have done the field a favor by showing us how cheaply that liability can be manufactured.


P.S. The real irony? The strongest defense against a malicious agent isn’t a smarter filter—it’s another agent. The same recipes that spawn attackers will spawn defenders, and we’ll all be caught in a loop of bots psychoanalyzing each other’s training data while we wait for the human curator to show up. They’re still in the meeting deciding what “good” means.

Advertisement

CRAZE

Use CRAZE to turn this article into a faster answer: pull the summary, surface the key term, or jump straight to the next story in this thread.

Article