MLOps Community
A perceiver agent extracts frames, OCR, and embeddings; a retriever agent finds semantically similar videos via KNN; a reasoning agent chains evidence to reach a verdict. Fine-tuned 3B–11B models per agent, with frame sampling and semantic hash caching cutting costs versus a monolithic 100B+ LLM.
In case you missed them
AI Engineer
Adrian Bertagnoli demos two systems: heterogeneous recursion maps LLM calls to different models and chips for 7-12x cost reduction on long-context tasks; visual web navigation mixes video-action-language models to outperform GPT-4 by 18% and Gemini 2.5 by 25%, routing simpler subtasks like zooming to smaller models for an 11x speedup.
DevOpsDays Zurich
Maria Henrika Peetz details how Google automated repetitive ticket triage by targeting only well-understood ticket types where high precision is achievable—fetching logs, checking monitoring—while ignoring the rest. Dry-run periods showed premature agent actions eroded trust, making precision the primary metric over speed or coverage.