OpenSSF
Attackers registered a one-character lookalike of pypi.org, ran a MitM proxy to capture TOTP sessions and mint API tokens, then published a Scavenger Loader variant via num-to-words—a transitive dependency of Hugging Face Transformers. WebAuthn would have blocked the attack; response took 40 volunteer hours across registrars and maintainers.
aiware
A five-layer model — code, prompts/workflows, governance, organizational logic, societal coordination — frames where real value emerges. Volvo case study: agents cut test-case creation from 2.5 days to minutes. Core argument is that prompts, policies, and workflows must be treated as first-class engineering objects alongside org-change skills.
In case you missed them
Simons Institute for the Theory of Computing
AlphaEvolve mutates programs that generate candidate proof objects—gadgets and graphs—scored by fast heuristic verifiers, then exhaustively verified. It tightened TSP inapproximability to 111/110, matched analytical max-cut bounds, and pushed Ramsey lower bounds 1-4 nodes past prior state-of-the-art where SAT/SMT solvers stalled.
Stanford Online
Multi-agent Gemini system uses ELO-ranked debate and self-play to generate and refine hypotheses over hours or days. Validated outputs include AML drug candidates, liver fibrosis epigenomic targets in Stanford organoids, and a novel plant immune protein; human experts remain essential for evaluation.
Simons Institute for the Theory of Computing
Hallucinations persist because accuracy-only metrics give models no reward for admitting uncertainty. Stating grading rules in prompts—open rubrics—shifts model behavior: when "I don't know" earns partial credit, models become calibrated and outperform baselines on both accuracy and hallucination rate.
Simons Institute for the Theory of Computing