Google AI Edge Achieves 19.6ms Low-Latency On-Device AI with LiteRT GPU Accelerator on Samsung Galaxy S24

June 6, 2026

Google AI Edge USA

Overview

Google AI Edge announced advancements in its LiteRT GPU Accelerator, which, while not yet open-sourced, is available as prebuilts for Kotlin and C++ SDK users. Benchmark results on a Samsung Galaxy S24 device demonstrate efficient GPU acceleration and full delegation, achieving low latency (e.g., 19.6ms for the hf_mms_300m model) for various on-device AI models. This significantly enhances the performance of local AI applications.

In Depth

Key Findings

Google AI Edge has announced significant advancements in its LiteRT GPU Accelerator, a technology poised to dramatically boost on-device AI performance. While not yet open-sourced, prebuilt versions are now available for Kotlin and C++ SDK users. Crucially, benchmark results on a Samsung Galaxy S24 device showcased remarkable efficiency, achieving an ultra-low inference latency of just 19.6 milliseconds (ms) for the hf_mms_300m model. This demonstrates effective GPU acceleration and full delegation across various AI models, heralding a new era for local AI applications.

Technical / Clinical Details

The LiteRT GPU Accelerator’s core innovation lies in its ability to maximize the utilization of a mobile device’s GPU resources for AI model inference. This enables complex AI processing to be executed directly on the device, bypassing the need for cloud dependency and mitigating network latency issues. For example, real-time image recognition, natural language processing, and advanced voice assistant functionalities can now respond instantaneously, independent of internet connectivity. The 19.6ms latency is a critical performance metric, significantly enhancing user experience by enabling more immersive and responsive AI applications. Full delegation means the entire computational graph of an AI model can be processed on the GPU, eliminating the overhead of frequent data transfers between the CPU and GPU, thus maximizing efficiency.

Background & Context

The interest in “on-device AI”—executing AI workloads directly on local hardware—has surged in recent years, driven by increasing demands for privacy protection, reduced reliance on network connectivity, and real-time processing capabilities. However, efficiently running complex AI models within the limited power and thermal design constraints of mobile devices has been a significant challenge. The LiteRT GPU Accelerator represents Google’s pivotal effort to overcome these hurdles, substantially expanding the performance and application possibilities for mobile AI. This technology is expected to enable more sophisticated AI features on a wide range of edge devices, including smartphones, wearables, and IoT devices.

Strategic Significance & Outlook

The introduction of the LiteRT GPU Accelerator is set to usher in a new era for on-device AI. Developers can now leverage this technology to build innovative AI applications that are fast, responsive, and privacy-preserving. This could enable advanced features such as localized personalized learning, sophisticated offline translation capabilities, and more immersive augmented reality (AR) experiences. Google anticipates that the eventual open-sourcing of this accelerator will allow a broader developer community to benefit, further accelerating the development of the edge AI ecosystem. This will, in turn, accelerate a future where AI is more deeply integrated into every aspect of our daily lives, making devices smarter, more proactive, and more personal, enhancing global technological accessibility and capability.

Source: https://developers.google.com/edge/litert/next/gpu

Let's share this post !