Meta runs three-agent pipeline to detect video theft and misalignment across 1B videos

MLOps Community

A perceiver agent extracts frames, OCR, and embeddings; a retriever agent finds semantically similar videos via KNN; a reasoning agent chains evidence to reach a verdict. Fine-tuned 3B–11B models per agent, with frame sampling and semantic hash caching cutting costs versus a monolithic 100B+ LLM.

In case you missed them

Callosum beats GPT-4 vision benchmarks by 18-25% with heterogeneous agents at 18x lower cost

AI Engineer

Adrian Bertagnoli demos two systems: heterogeneous recursion maps LLM calls to different models and chips for 7-12x cost reduction on long-context tasks; visual web navigation mixes video-action-language models to outperform GPT-4 by 18% and Gemini 2.5 by 25%, routing simpler subtasks like zooming to smaller models for an 11x speedup.

Google's on-call LLM agents optimize for precision over coverage to earn operator trust

DevOpsDays Zurich

Maria Henrika Peetz details how Google automated repetitive ticket triage by targeting only well-understood ticket types where high precision is achievable—fetching logs, checking monitoring—while ignoring the rest. Dry-run periods showed premature agent actions eroded trust, making precision the primary metric over speed or coverage.