Sara Hooker: scaling is hitting limits, adaptation and post-training are the next frontier

Hugging Face

Hooker presents evidence that smaller models now outperform larger ones, model weights carry severe redundancy, and recent releases like GPT-4.5 and Llama 4 showed returns too poor to justify serving costs. The talk covers three vectors: post-training optimization, test-time compute on high-uncertainty examples, and continuous learning — illustrated by Auto Scientist, which outperformed human researchers on fine-tuning configuration search.

In case you missed them

Stanford: 75% of AI revenue flows to chips, leaving applications structurally unprofitable

Stanford Online

Maps the generative AI value chain across semiconductors, infrastructure, and applications, showing $350B in new revenue concentrated at Nvidia despite 10x application growth over two years. Covers why near-zero marginal cost breaks down when serving users burns GPU compute, and what conditions—custom ASICs, inference dominance, hyperscaler integration—could reprice the stack.

Chroma's Context One hits SOTA retrieval F1 at 75x the speed of Claude Opus

DeepLearning.AI

Jeff Huber argues context windows suffer 'context rot' beyond ~40K tokens, making naive stuffing ineffective. Chroma's 20B-parameter Context One model uses agentic search loops—hybrid search, regex, document fetching—to hit state-of-the-art retrieval F1 at 3,000 tokens/sec versus Opus's 40, at 1/25th the cost.

Stanford study finds LLMs reverse and compose facts in-context but fail when fine-tuned

Stanford Online

Controlled experiments on facts, syllogisms, and encodings show fine-tuned models fail to reverse relations or compose logical chains, while the same models nearly ace both tasks given the data in context. Three mitigations tested: offline data augmentation, episodic retrieval at inference time, and RL-driven regeneration, each trading training cost for inference cost.

AWS bets enterprise agentic AI is gated on defect rates, not model capability

DeepLearning.AI

Marc Brooker maps agent failures into four quadrants by frequency and severity, argues only low-frequency, low-consequence errors have real enterprise TAM, and outlines AWS investments in correct-by-construction frameworks (Hydro, Cedar), automated reasoning, and deterministic agent steering to get there.

AMD ships GPU instruction translator in 48 hours using AI agents instead of years

DeepLearning.AI

Anush Elangovan details four AMD projects where agentic AI collapsed multi-year cycles: a Rosetta-like GPU instruction translator built in 48 hours, an autonomous performance optimizer, seamless CPU-GPU-NPU tensor movement, and a high-speed tokenizer. The competitive shift moves from syntax knowledge to systems thinking and intent velocity.