Typo-squatted domain + MitM proxy nets four PyPI accounts, injects malware into 30M-download package

OpenSSF

Attackers registered a one-character lookalike of pypi.org, ran a MitM proxy to capture TOTP sessions and mint API tokens, then published a Scavenger Loader variant via num-to-words—a transitive dependency of Hugging Face Transformers. WebAuthn would have blocked the attack; response took 40 volunteer hours across registrars and maintainers.

Agents reshape software engineering by mediating organizational workflows, not just automating code

aiware

A five-layer model — code, prompts/workflows, governance, organizational logic, societal coordination — frames where real value emerges. Volvo case study: agents cut test-case creation from 2.5 days to minutes. Core argument is that prompts, policies, and workflows must be treated as first-class engineering objects alongside org-change skills.

In case you missed them

DeepMind's AlphaEvolve improves TSP hardness ratio and Ramsey bounds unsolved for decades

Simons Institute for the Theory of Computing

AlphaEvolve mutates programs that generate candidate proof objects—gadgets and graphs—scored by fast heuristic verifiers, then exhaustively verified. It tightened TSP inapproximability to 111/110, matched analytical max-cut bounds, and pushed Ramsey lower bounds 1-4 nodes past prior state-of-the-art where SAT/SMT solvers stalled.

DeepMind Co-Scientist agents produce experimentally validated hypotheses in medicine and biology

Stanford Online

Multi-agent Gemini system uses ELO-ranked debate and self-play to generate and refine hypotheses over hours or days. Validated outputs include AML drug candidates, liver fibrosis epigenomic targets in Stanford organoids, and a novel plant immune protein; human experts remain essential for evaluation.

OpenAI finds evaluation rubrics, not training, drive LLM hallucinations

Simons Institute for the Theory of Computing

Hallucinations persist because accuracy-only metrics give models no reward for admitting uncertainty. Stating grading rules in prompts—open rubrics—shifts model behavior: when "I don't know" earns partial credit, models become calibrated and outperform baselines on both accuracy and hallucination rate.

Tom Mitchell on how LLMs reshape the learning theory framework for modern ML

Simons Institute for the Theory of Computing