Black Hat: prompt injection on multi-agent LLM systems bounded by agent permissions

Black Hat

Maps the attack surface across orchestration frameworks with five CVEs—VS Code Copilot, Outlook Copilot, Salesforce agents—showing kill chains from RAG poisoning to CSP-bypass exfiltration. Defenses focus on context firewalls, scoped per-session capabilities, and telemetry at LLM-to-code boundaries.

Agentic AI velocity gains vanish within 2 months without code health above 9.5

JFokus

Adam Tornhill presents research showing 2-3x task speed gains evaporate in weeks as AI-induced complexity accumulates. Covers three mitigations: MCP server health enforcement, mandatory 100% test coverage, and CLEAR architectural principles — plus evidence that healthy code cuts token consumption 29-50%.

In case you missed them

Sara Hooker: scaling is hitting limits, adaptation and post-training are the next frontier

Hugging Face

Hooker presents evidence that smaller models now outperform larger ones, model weights carry severe redundancy, and recent releases like GPT-4.5 and Llama 4 showed returns too poor to justify serving costs. The talk covers three vectors: post-training optimization, test-time compute on high-uncertainty examples, and continuous learning — illustrated by Auto Scientist, which outperformed human researchers on fine-tuning configuration search.

Stanford: 75% of AI revenue flows to chips, leaving applications structurally unprofitable

Stanford Online

Maps the generative AI value chain across semiconductors, infrastructure, and applications, showing $350B in new revenue concentrated at Nvidia despite 10x application growth over two years. Covers why near-zero marginal cost breaks down when serving users burns GPU compute, and what conditions—custom ASICs, inference dominance, hyperscaler integration—could reprice the stack.

Stanford study finds LLMs reverse and compose facts in-context but fail when fine-tuned

Stanford Online

Controlled experiments on facts, syllogisms, and encodings show fine-tuned models fail to reverse relations or compose logical chains, while the same models nearly ace both tasks given the data in context. Three mitigations tested: offline data augmentation, episodic retrieval at inference time, and RL-driven regeneration, each trading training cost for inference cost.