Saturday, May 30, 2026 — front page

AI commoditizes vulnerability markets

NCSC chief warns AI commoditizes premium vulns, threatens offensive talent pipelines OffensiveCon
TL;DW
Defense requires high-caliber offensive capability—sophisticated red teams and threat actors drive multi-year, multi-billion-dollar defensive improvements in organizations.
Everything is vulnerable and always will be; zero-vulnerability systems are economically unviable and technically impossible at scale.
AI is rapidly commoditizing premium vulnerability classes: memory safety issues once worth premium prices are now broadly viewed as commodity within months.
Over 80 countries have purchased offensive cyber capabilities on the open market, shifting from R&D investment to buying the best capability with a checkbook.
Artificial intelligence compressed exploit development from hours, days, or weeks into seconds or minutes; context windows and token costs remain temporary constraints.
Open-source AI models are lagging frontier models by only 6-8 months—a gap that may narrow further, making exclusive capability access increasingly untenable.
UK and France via Pammal process are establishing norms for the offensive ecosystem: transparency, customer accountability, and disruption of irresponsible actors.
Talent pipeline risk: if AI disrupts early-stage learning grounds before practitioners build fundamental understanding of architecture and OS-level mechanics, knowledge will hollow out.
Memory safety technologies like CHERI CPU architecture and non-phishable multi-factor authentication (passkeys) represent architectural wins over whack-a-mole patching approaches.
State adversaries are stealing offensive security tools (Carina, Dark Souls breaches); researchers and companies must understand supply chain risks and know their customers.

Ollie Whitehouse argues defense requires a functioning offensive ecosystem, then details how LLMs are eroding the premium vulnerability market and gutting entry-level researcher incentives. He outlines the NCSC's Pammal bifurcation strategy—supporting responsible vendors while easing disruption of irresponsible ones—and calls for evidence-based metrics on defensive ROI.

Supply chain phishing at 30M-download scale

Typo-squatted domain + MitM proxy nets four PyPI accounts, injects malware into 30M-download package OpenSSF
TL;DW
Attackers registered a one-letter domain typo (e.g., pypi.org vs a similar domain) and built a man-in-the-middle proxy to phish PyPI maintainers; only four clicked through but had credentials compromised.
Phishing-resistant WebAuthn (passkeys/hardware keys) cannot be exploited by proxy attacks because the browser cryptographically validates the correct domain before prompting—TOTP codes can be captured and reused.
Attackers targeted num-to-words, a transitive dependency of Hugging Face Transformers (30M daily downloads), banking on unpinned dependencies to distribute Scavenger Loader malware at scale.
PyPI processes 13 billion requests per day with 900+ new packages daily; only one full-time security engineer handles incident response because volunteer staff cannot provide 24/7 coverage.
Trusted publishing eliminates long-lived API tokens by cryptographically linking package uploads to CI/CD platforms (GitHub, GitLab, Google Cloud Build, CircleCI); removes the primary attack vector.
Domain registrars and abuse services notify attackers when reports are filed, defeating rapid response; legal cease-and-desist letters are now required to effectively block malicious domains.
PyPI added mandatory email reconfirmation for TOTP logins from new devices/IPs (November 2025) as friction to slow phishing success while promoting WebAuthn as the frictionless alternative.
Attackers returned in September 2025 with the same attack pattern, proving persistence; domain registration remains cheap and threat actors are learning PyPI patterns and targeting popular transitive dependencies.
Dependency cool-down periods (3–7 days in pip, uv, Dependabot) let security researchers catch malicious packages before widespread installation; median detection time is ~5 hours during working hours.
WebAuthn adoption requires significant UX/cultural change and education; PyPI cannot mandate it without breaking existing workflows, but must nudge users toward phishing-resistant authentication over time.

Attackers registered a one-character lookalike of pypi.org, ran a MitM proxy to capture TOTP sessions and mint API tokens, then published a Scavenger Loader variant via num-to-words—a transitive dependency of Hugging Face Transformers. WebAuthn would have blocked the attack; response took 40 volunteer hours across registrars and maintainers.

Signals standardization across JS frameworks

TC39 moves to standardize signals as JS frameworks converge on fine-grained reactivity NDC Conferences
TL;DW
Signals are reactive variables that automatically recalculate dependent values when their dependencies change, eliminating manual recalculation work in application state management.
JavaScript frameworks shifted from pull-based rendering (server-side templates) to push-based DOM updates (jQuery era) to gain performance, losing predictability in the process.
Knockout introduced observables and data binding to regain the predictability of pull-based approaches while maintaining push performance—a pattern Vue, Svelte, Solid, and Angular have adopted.
React deliberately uses the pull approach (whole-app re-renders on state change) with memoization and virtual DOM instead of signals, prioritizing consistency and predictability over fine-grained reactivity.
Signal implementations must handle order-of-recalculation (topological sorting), batching updates to prevent UI glitches, and dirty-state tracking to skip unnecessary recalculations.
TC39 proposal aims to standardize signals in JavaScript core language rather than having each framework reinvent the wheel with custom implementations.
Solid pioneered the 'push and pull' hybrid approach for signals: pushing state changes but only recalculating derived values when actually read from the UI.
React's new compiler automatically memoizes functions and state, achieving similar efficiency gains to signal-based frameworks without changing React's fundamental pull-based architecture.

Traces the shift from React's pull model (re-render + memoize) to signals' push model (dependency tracking, surgical DOM updates), with a live implementation covering subscriptions, dirty-state tracking, and batching. Closes with the TC39 signals proposal and what native browser support eliminates for framework authors.

LLMs lack persistent state tracking

Kleinberg finds LLMs miscount objects in generated stories at 15-40% error rates Simons Institute for the Theory of Computing
TL;DW
LLMs fail basic world model tasks like counting people in narratives (~15% error rate), yet solve identical arithmetic instantly when framed as math problems—suggesting errors stem from attention allocation, not capability.
Order dependence in state tracking: describing budget categories from high-to-low causes systematic inflation across all categories, violating consistency expected from systems with genuine world models.
Repeated revision attempts reach chemical equilibrium, not zero errors—models fix some errors but reintroduce new ones at matching rates, creating stable error floors impossible for humans to eliminate through iteration alone.
Framing dramatically affects numerical accuracy: stories generate 15% errors, blog posts 9.5%, news articles 2.5%, and math problems ~0.2%—the same underlying capability behaves radically differently based on genre framing.
Myhill-Nerode theorem applied to sequence-generating systems: states can be extracted as equivalence classes of sequences, enabling principled probing for world models in game-playing, navigation, and constraint-satisfaction tasks.
Models maintain state propagation consistency (e.g., sports scores) even when starting from corrupted states (~80-94% transition accuracy), suggesting they represent implicit dynamics rather than absolute facts.
Compass direction tracking shows models confabulate details (shadows always point toward sunset regardless of direction traveled), indicating they optimize for narrative plausibility over geometric consistency.
Multi-model revision scheduling is solvable via Bellman equations: using cheap models early to reduce errors, then expensive models to grind out remaining errors, yields minimum-cost error-reduction strategies.
Navigation descriptions achieve 20% error rate per location even with tool use enabled, catching some errors while generating others—same model identifies failures it cannot prevent.
World models in LLMs may be fundamentally about our explanation of what's happening inside rather than the model's understanding of the world—making definition and measurement inherently observer-dependent.

Using Myhill-Nerode theorem analysis and navigation tasks, Cornell's Kleinberg shows LLMs lack persistent state maintenance during generation—models fail to track people and objects across narratives but catch the same errors when explicitly prompted, revealing a gap between language fluency and world-model coherence.