Black Hat: prompt injection on multi-agent LLM systems bounded by agent permissions

Black Hat

Maps the attack surface across orchestration frameworks with five CVEs—VS Code Copilot, Outlook Copilot, Salesforce agents—showing kill chains from RAG poisoning to CSP-bypass exfiltration. Defenses focus on context firewalls, scoped per-session capabilities, and telemetry at LLM-to-code boundaries.

Anthropic's Mythos generates working exploits from source code, but tempo beats magic as real threat

JFokus

Dan Bergh Johnsson dissects what Mythos actually demonstrates: exploit generation from identified vulnerabilities, not novel attack vectors. Mozilla's harness improvements mattered as much as model capability; curl's 176K audited lines yielded one non-critical find. The real risk is speed—AI compresses the window from vulnerability discovery to working exploit.

Agentic AI velocity gains vanish within 2 months without code health above 9.5

JFokus

Adam Tornhill presents research showing 2-3x task speed gains evaporate in weeks as AI-induced complexity accumulates. Covers three mitigations: MCP server health enforcement, mandatory 100% test coverage, and CLEAR architectural principles — plus evidence that healthy code cuts token consumption 29-50%.

AI coding tools generate technical debt faster than orgs can measure it, Singh warns

DeepLearning.AI

Barun Singh argues current metrics—PRs shipped, features deployed—mask accumulating technical debt from unreviewed AI-generated code, predicting a forced rewrite reckoning within 12-24 months. Supervised agents (human-reviewed) currently outperform autonomous pipelines on complex codebases; QA and review processes, not generation speed, are the real bottleneck.

Sara Hooker: scaling is hitting limits, adaptation and post-training are the next frontier

Hugging Face

Hooker presents evidence that smaller models now outperform larger ones, model weights carry severe redundancy, and recent releases like GPT-4.5 and Llama 4 showed returns too poor to justify serving costs. The talk covers three vectors: post-training optimization, test-time compute on high-uncertainty examples, and continuous learning — illustrated by Auto Scientist, which outperformed human researchers on fine-tuning configuration search.