Innovation in Model Compression for AI Efficiency

Wednesday, August 13, 2025 11:55 AM to 12:15 PM · 20 min. (US/Pacific)

155 (Level 1)

Solo Talk

Generative AI: Deployment at Scale

Information

In 2025, Deloitte, Multiverse Computing, and Intel initiated a collaborative study to assess the viability of CPU-based IT architectures for generative AI inferencing, with the goal of assessing cost reduction and efficiency improvement impacts. By combining Intel’s advanced processors and OpenVINO hosting technology, Multiverse’ Computing's AI model compression methodologies, and Deloitte’s AI engineering experience, the collaboration conducted a rigorous evaluation of CPU-based AI performance deployed in the AWS cloud. The study demonstrated that CPU-based inferencing can achieve accuracy and performance on par with traditional GPU-based approaches, while delivering notable efficiency gains as user demand scales. These findings can potentially have meaningful implications for organizations facing cost or resource constraints, and open new opportunities for deploying generative AI in edge computing and on-premise environments.

AI Transformation Tracks

Generative AI: Deployment at Scale

Speakers

Chris Zaharias

VP U.S. OperationsMultiverse Computing

Robert Simmons

Specialist LeaderDeloitte

Saaket Varma

Senior Solution SpecialistDeloitte

Innovation in Model Compression for AI Efficiency

Information

Speakers

Log in

Log in