Anthropic, the AI safety-focused laboratory behind the Claude family of models, has decided to withhold its latest system from public release. The model, internally designated Claude Mythos Preview, has demonstrated an unprecedented capacity to identify security flaws embedded in existing software infrastructure — including critical vulnerabilities that have persisted undetected in production code for decades. Rather than proceed with a conventional launch, the company is channeling Mythos into a controlled defensive program called Project Glasswing, a collaboration with twelve corporate partners aimed at patching the very weaknesses the model has surfaced.

The decision marks a concrete inflection point in the AI industry's relationship with its own output. Companies have long spoken about responsible deployment in abstract terms; Anthropic's move translates that rhetoric into a specific operational choice — forgoing revenue and competitive positioning in exchange for what it frames as systemic risk reduction.

A model that reads the cracks

The core capability at issue is Mythos's apparent proficiency in reasoning through complex, layered software systems. Modern digital infrastructure is built on decades of accumulated code, much of it written under constraints and assumptions that no longer hold. Legacy systems in banking, telecommunications, energy grids, and government services often contain vulnerabilities that persist not because they are trivial, but because the codebases are too vast and tangled for human auditors — or previous-generation automated tools — to surface them reliably.

A model capable of systematically identifying such flaws presents a dual-use problem that is familiar in security research but novel at this scale. The concept of "responsible disclosure" has governed how individual researchers handle the discovery of software bugs for years: find a vulnerability, notify the vendor privately, allow time for a patch before making the finding public. What Mythos introduces is not a single vulnerability but, reportedly, thousands — a volume that overwhelms conventional disclosure workflows and raises the stakes of any leak or unauthorized access.

Project Glasswing appears designed as an industrial-scale version of that responsible disclosure process. By limiting access to twelve vetted partners, Anthropic is attempting to create a controlled environment where the model's offensive insight is converted into defensive action before the knowledge can propagate. The logic is straightforward: if the vulnerabilities exist regardless, it is preferable that defenders find them first.

The containment precedent and its limits

The decision invites comparison with earlier episodes in which powerful capabilities were deliberately restricted. In the biological sciences, debates over gain-of-function research have produced similar containment frameworks — work proceeds under strict protocols, with publication delayed or redacted to prevent misuse. In AI specifically, OpenAI withheld the full version of GPT-2 in 2019 over concerns about text generation being used for disinformation, though it eventually released the model in stages after the anticipated harms did not materialize at the scale feared.

Anthropic's situation differs in a meaningful respect. The risk with Mythos is not speculative misuse of a general capability but the concrete existence of actionable exploit paths in live systems. Every day those vulnerabilities remain unpatched is a day they could be independently discovered — by state-sponsored hacking groups, criminal organizations, or rival AI systems trained on similar data. The containment strategy therefore operates under a ticking clock that purely theoretical risks do not impose.

The broader question the episode raises is structural. Anthropic can choose restraint; it cannot compel the same choice from every other laboratory pursuing frontier capabilities. If a model of comparable power emerges from an organization with different incentives — a state-backed project, a less safety-oriented competitor, or an open-source effort — the vulnerabilities Mythos has found will not become less exploitable. Project Glasswing is, in that sense, a race not only against the bugs themselves but against the diffusion curve of the capability that found them.

Whether twelve corporate partnerships can move fast enough to close thousands of critical flaws before the window narrows is an open question. So is the matter of what happens after the patching phase: whether Mythos eventually reaches a broader release, whether its architecture informs future models that will be harder to contain, and whether the precedent of voluntary withholding holds when the competitive pressure to ship grows sharper. The tension between capability and control is no longer a thought experiment. It is an operational constraint with a deadline that no one involved can precisely name.

With reporting from El País Tecnología.

Source · El País Tecnología