Luma unifies text, image, video, and audio in one transformer backbone to add reasoning to generation

Stanford Online

Amit Jain outlines why video encodes 3D geometry through time, making it richer training data than images, then explains how Luma's single shared-latent-space transformer enables multi-turn dialogue and iterative refinement — capabilities absent from diffusion-only or modality-siloed architectures.

Factory runs software projects for 16 days autonomously via serial agents and validation contracts

AI Engineer

Factory's Missions system chains planner, worker, and validator agents serially—avoiding conflicts from parallelization—with a correctness contract defined before coding begins. Workers inherit clean state from predecessors; validators span linting, type-checking, and live user-testing. Longest production run: 16 days.

In case you missed them

GPT-2 hallucinates speaker switches in dialogue, mirroring human Moses illusion

PyData

Julia Mertens fine-tunes GPT-2 on dialogue data and measures surprisal against human reading times on controlled stimuli. The model treats natural same-speaker continuations as more surprising than incongruent ones—a reversal that persists regardless of scale and mirrors the Moses illusion, where prior representations override bottom-up input processing.

NDC: Data lake exposure eclipses prompt injection as critical risk in agentic systems

NDC Conferences

Jon McCoy argues prompt injection concern reflects survivor bias while the real threat is multi-agent, multi-team data lake access without workload isolation. Recommends treating data lake connections as internet-exposed endpoints, decomposing lakes by workload, and having security teams own every data pull.