Building Multimodal AI Agents: From Concept to Code [RSVP Required]
Monday, August 11, 2025 10:30 AM to 12:30 PM · 2 hr. (US/Pacific)
104 (Level 1)
Workshop
Workshop
Information
RSVP REQUIRED DUE TO LIMITED CAPACITY! Click here to RSVP
Transform your AI applications building agents that see, think, and reason across modalities.In today's AI landscape, single-modality systems fall short of real-world needs. This hands-on workshop equips you with the skills to create intelligent agents that seamlessly process both visual and textual information, a critical capability for next-generation applications.
Discover why multimodal agents represent a quantum leap beyond simple prompting and RAG systems, with concrete examples of how they solve previously intractable problems in document processing and visual reasoning. Through practical demonstration, we deconstruct the four pillars of effective agent architecture: perception, planning/reasoning, tools, and memory.
The workshop reveals breakthrough approaches to mixed-modality document handling, showcasing how modern VLM-based embedding models overcome traditional limitations while dramatically simplifying implementation pipelines.
You'll build a multimodal agent using Python, Gemini 2.5, and MongoDB that can:
(1) Answer complex questions about documents containing interleaved text and visuals,
(2) Explain charts and diagrams with human-like reasoning,
(3) Maintain conversational context across multiple interactions.
This isn't theoretical, you'll leave with working code you can immediately adapt to your projects, saving weeks of development time. Whether you're building customer-facing applications or internal tools, these techniques provide a competitive edge in the rapidly evolving AI landscape.
Keynotes, Meals, Networking, & More
Roundtables, Workshops & Q&A Sessions
