Anthropic's Claude 3.5: Gefängnisbrecher generiert Stack-Exploits
Anthropic launched Claude Fable 5 on June 9, 2026, as the first publicly available model in its new Mythos class, its most capable AI to date, excelling in software engineering, knowledge work, and vision benchmarks.

Kurzfassung
Warum das wichtig ist
- Anthropic launched Claude Fable 5 on June 9, 2026, as the first publicly available model in its new Mythos class, its most capable AI to date, excelling in software engineering, knowledge work, and vision benchmarks.
- Researcher “Pliny the Liberator” defeats Claude Fable 5’s safety classifiers using multi-agent decomposition, Unicode tricks, and narrative framing, leaking the model’s 120,000-character system prompt along the way.
- The release came with an unusual design decision: Fable 5 and its restricted twin, Claude Mythos 5, the same underlying model but are split by a layer of safety classifiers.
SvyTech-Check
Redaktionelle Einordnung
Kernpunkt
Anthropic launched Claude Fable 5 on June 9, 2026, as the first publicly available model in its new Mythos class, its most capable AI to date, excelling in software engineering, knowledge work, and vision...
Warum relevant
When a query trips a classifier in high-risk categories cybersecurity, biology, chemistry, or model distillation Fable 5 silently hands off the request to the weaker Claude Opus 4.8, notifying the user of the...
Einordnung
SvyTech ordnet die Meldung aus Cyber Security News als Teil des Themenfelds Technologie ein und verweist auf den Originalartikel, damit Leser Fakten, Quelle und Kontext nachvollziehen koennen.
When a query trips a classifier in high-risk categories cybersecurity, biology, chemistry, or model distillation Fable 5 silently hands off the request to the weaker Claude Opus 4.8, notifying the user of the fallback. Anthropic claimed an external bug bounty produced no universal jailbreaks across over 1,000 hours of testing before launch.
That claim was almost immediately tested.
Multi-Agent Bypass Within Days Within days of release, prolific AI red-teamer Pliny the Liberator publicly announced he had bypassed Fable 5’s safety layers using a coordinated multi-agent attack strategy he called “a pack hunt.” Screenshots shared, including step-by-step stack buffer overflow exploitation guidance for x86 Linux systems, including disabling ASLR, writing vulnerable C server code with strcpy overflows, and compiling without protections — as well as the Birch reduction mechanism, a classic meth synthesis pathway.
Sicherheitslage und Risiko
Pliny documented the attack vectors used to achieve these bypasses, including: Unicode, homoglyphs, and Cyrillic character substitution to evade keyword classifiers Long-context reference tracking to smuggle harmful intent across large conversations Taxonomy and document-structure framing — embedding harmful queries inside legitimate-looking study guides or academic references Fiction and narrative framing to mask offensive intent as creative content Decomposition and recomposition — extracting sensitive technical information in benign, isolated chunks, then reassembling them into actionable uplift The last technique proved most effective.

As Pliny described it, “getting uplift on the process itself, like Birch reduction method or reductive amination, is much more doable” than requesting a named harmful compound directly. Using a jailbroken Opus instance to assist in the backend further lowered the difficulty.
Beyond the technical bypasses, Pliny also leaked Fable 5’s ~120,000-character system prompt to GitHub, exposing the internal framing and safety instructions Anthropic uses to govern the model’s behavior at the base level. The incident reignites the longstanding tension between AI capability and safety containment.
Anthropic’s classifier architecture routing flagged requests
Anthropic’s classifier architecture routing flagged requests to a weaker fallback model rather than refusing outright was designed to reduce friction for legitimate users.
However, Pliny argued the approach creates a false sense of security while simultaneously frustrating legitimate security researchers who need access to offensive techniques for defensive work. Anthropic has not yet publicly responded to the jailbreak claims or the leaked system prompt at the time of writing.
The episode also draws attention to the broader challenge of securing agentic, multi-model pipelines: when one jailbroken model (Opus) can assist another (Fable 5) in evading controls, single-model safety evaluations may be fundamentally insufficient.
Quelllink
Originalquelle: Cyber Security News
Thema weiterverfolgen
Interne Verlinkung
Im Kontext weiterlesen
Diese weiterfuehrenden Links verbinden das Thema mit relevanten Archivseiten, Schlagwoertern und inhaltlich nahen Artikeln.
Technologie Archiv
Weitere Meldungen aus derselben Hauptkategorie.
Mehr von Cyber Security News
Alle veroeffentlichten Inhalte derselben Quelle im Archiv.
Intel-Project Firefly: Die besten Smartphone-Technologien für ultradünne 12,9-mm-Metall-Laptops ohne Lüftungsschlitze
Redaktionell verwandter Beitrag aus dem selben Themenumfeld.
Neuer Quantencode senkt Fehler um das 1.000-Fache bei nur einem Achtel der Qubits
Redaktionell verwandter Beitrag aus dem selben Themenumfeld.
Quellenprofil
Quelle und redaktionelle Angaben
- Quelle
- Cyber Security News
- Originaltitel
- Anthropic’s Claude Fable 5 Jailbroken to Generate Stack Exploits
- Canonical
- https://cybersecuritynews.com/anthropics-claude-fable-5-jailbroken/
- Quell-URL
- https://cybersecuritynews.com/anthropics-claude-fable-5-jailbroken/
Aehnliche Inhalte
Verwandte Themen und interne Verlinkung
Weitere Artikel aus aehnlichen Themenfeldern, damit Leser direkt im selben Kontext weiterlesen koennen.

Intel-Project Firefly: Die besten Smartphone-Technologien für ultradünne 12,9-mm-Metall-Laptops ohne Lüftungsschlitze
Intel erläutert Projekt Firefly und warum es für den Massenmarkt wichtig ist: Intel Projekt Firefly richtet sich an die breite Masse und bringt kosteneffiziente Notebook-Lösungen auf Bas
11.06.2026
Live Redaktion
Neuer Quantencode senkt Fehler um das 1.000-Fache bei nur einem Achtel der Qubits
IQM Quantum Computers hat einen neuen Quantenfehlerkorrekturcode entwickelt, der laut Unternehmensangaben die logischen Fehlerraten Vergleich zum weit verbreiteten Surface-Code um bis zu 1.000-fach senken kann, währen
11.06.2026
Live Redaktion
NVIDIA bietet DeepMinds DiffusionGemma-Modell ab Tag eins auf RTX- und DGX-Plattformen an: 150 Token pro Sekunde mit DGX Spark.
Die gesamte RTX- und DGX-Produktlinie ält volle Unterstützung für das Open-Source-Modell DiffusionGemma sein neuestes Open-Source-Modell DiffusionGemma vor: NVIDIA bietet
11.06.2026
Live Redaktion
US: Wissenschaftler entdecken erstmals Higgs-Modus in einem Halbleiter
Wissenschaftler Argonne National Laboratory (ANL) in den USA haben erstmals eine schwer fassbare Schwingungsart, den sogenannten Higgs-Modus, in einem Halbleitermaterial nachgewiesen.
11.06.2026
Live Redaktion