Innovation in Model Compression for AI Efficiency

Innovation in Model Compression for AI Efficiency

Wednesday, August 13, 2025 11:55 AM to 12:15 PM · 20 min. (US/Pacific)
155 (Level 1)
Solo Talk
Generative AI: Deployment at Scale

Information

In 2025, Deloitte, Multiverse Computing, and Intel initiated a collaborative study to assess the viability of CPU-based IT architectures for generative AI inferencing, with the goal of assessing cost reduction and efficiency improvement impacts. By combining Intel’s advanced processors and OpenVINO hosting technology, Multiverse’ Computing's AI model compression methodologies, and Deloitte’s AI engineering experience, the collaboration conducted a rigorous evaluation of CPU-based AI performance deployed in the AWS cloud. The study demonstrated that CPU-based inferencing can achieve accuracy and performance on par with traditional GPU-based approaches, while delivering notable efficiency gains as user demand scales. These findings can potentially have meaningful implications for organizations facing cost or resource constraints, and open new opportunities for deploying generative AI in edge computing and on-premise environments.
AI Transformation Tracks
Generative AI: Deployment at Scale

Log in

See all the content and easy-to-use features by logging in or registering!