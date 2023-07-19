By Ziad Asghar, Qualcomm Technologies, Inc.

If we compare generative AI to personal computing, we’re at the dawn of the personal computer age. And just like computers on our desks changed our lives, we’re about to put generative AI into our pockets, on our desks, and into our cars, with a similarly massive effect.

It’s understandable that the AI discussion up until now has been cloud centered. Popular large language models like GPT use huge amounts of storage and memory, and much of the focus so far has been on training and refining new models, which generally takes place in the cloud. But as we move from the era of training to the era of inference – actually using the models – more of the focus will move to devices.

That’s going to become the true democratization moment for generative AI, when these tools become available to everyone with a smartphone, not just to people with fast connections and cloud service subscriptions. Bringing AI onto devices is the only way to handle the exponential demand that’s in our future.

On-device AI has unique strengths for application developers and end users, both consumers and in the enterprise. AI running on-device ensures data privacy, and can make use of device sensor data to personalize the AI experience without sharing personal information. That is critical if we want to push education and healthcare forward with AI. On-device AI brings the benefits of generative AI to people and places with limited connectivity, which is crucial to helping close the global digital divide. And in a future where we may struggle with authenticating what was created with AI, the on-device approach brings confidence.

Without an on-device focus for AI, cloud-based AI will consume massive amounts of energy and drive a one-way flow of cost up to cloud service providers. That’s unsustainable.

As we move AI onto the device, priorities change. The focus is no longer on server racks with hundreds of watts of electricity available per chip. Rather, it’s which chips can run AI hyper-efficiently, without denting handheld batteries. Chips with dedicated AI components can quickly process AI workloads, as those workloads can bring standard CPUs to their knees. And we’re no longer dealing with tens of gigabytes of memory, so we need to compile and quantize AI models to work in less space.

Moving AI inference to the billions of devices around us moderates the cost, distributes the power demands, and puts AI tools in every pocket. So investors must ask not only, who is creating the most powerful server racks or offering the most online services, but who’s best adapted to executing these queries on devices around you?

At Qualcomm Technologies we have been working on all of these AI capabilities for a decade now, pushing AI through camera, voice recognition and modem stacks, and now into general-purpose, generative AI functions.

We have already showcased generative AI models with over 1 billion parameters running on phones, and we are set to support models with 10 billion parameters or more in the coming months. Most generative AI use cases, including large language models and multi-model applications, can be covered with around 10 billion parameters.

Our Qualcomm® AI Engine hardware, our AI Model Efficiency Toolkit (AIMET) and Qualcomm® AI software stack are designed to run models with the utmost efficiency. We're not the only ones working on making on-device AI more efficient; if you go out into the open-source world, you'll find tactics like four-bit quantization are becoming steadily more popular as developers seek to make these models work better across all devices.

This is also where Qualcomm Technologies is unique in its footprint and scale, allowing generative AI applications to go global quickly. There are billions of user devices powered by Snapdragon® and Qualcomm® platforms today – and many hundreds of millions of devices powered by our platforms enter the market each year. Our AI capabilities span a wide range of product categories, including mobile, vehicles, XR, PC, and IoT.

Just like traditional computing evolved from mainframes and thin clients to today’s mix of cloud and edge devices, AI processing must be distributed between the cloud and devices for AI to scale and reach its full potential. Such hybrid AI architecture offers benefits with regards to cost, energy, performance, privacy, security, and personalization.

Many AI models, especially personalized ones, will then run on device, while larger, more generic models can use the cloud as needed. The experience will be seamless to the user. All they’ll see is the best AI experience, tailored to their own device and level of connectivity. No one will be left out.

AI investments can also benefit from a balanced approach with positions in both cloud and devices, as both will work together to satisfy the upcoming demand for AI. Whether we are talking about distributing compute workloads or shaping portfolios – the future of AI is hybrid.

