Report: You can now run local AI models on your phone, but that doesn't mean you should

Image: Hugging Models on x.com | Screenshot

The tech industry has spent years bragging about whose cloud-based AI model has the most trillions of parameters and who poured more billions of dollars into data centers. However, the open-source AI scene seems to be going in a different direction. While it’s true that there are some open-source models that basically require data-center-level hardware to run, we’re also constantly getting smaller models that are increasingly capable and can run on consumer hardware. And by consumer hardware, I mean mobile phones.

Alibaba recently dropped its new Qwen 3.5 Small Model Series, featuring highly optimized models ranging from 0.8 billion to 9 billion parameters. These models, despite their tiny size and modest (for AI standards) computing requirements, are already closing the gap with closed-source models from tech giants. They can process both text and images, and even outperformed some smaller models from OpenAI and Google, like GPT-5 nano and Gemini 2.5 Flash-Lite, in certain benchmarks.

The most fascinating part of this release is the 2B (two billion parameter) model. While models like Qwen 3.5-9B still require at least 5GB of VRAM to run smoothly on your PC, the 2B variant is so compressed that it can run on a smartphone. In fact, people are already downloading it and running it completely locally on their Androids or iPhones.

Think about what that actually means for a second. We now have a natively multimodal AI that can process text and images simultaneously, running entirely off a smartphone processor. Because all the computing happens on the device, your data never goes to a server. It requires zero internet connection and costs nothing per month.

Alibaba’s Qwen 3.5 is now running fully on-device on the iPhone 17 Pro.

It outperforms models 4x its size, delivers strong visual understanding, and lets you switch reasoning on or off.

This demo uses the 2B 6-bit version, optimized with MLX for Apple Silicon. pic.twitter.com/IXfulitef4
— Hugging Models (@HuggingModels) March 3, 2026

You can install and run Qwen 3.5-2B on your phone right now for absolutely free.

Still, Qwen 3.5-2B is a tiny language model with very limited capabilities. For reference, frontier AI models feature hundreds of billions and even trillions parameters.

Qwen 3.5-2B can"t match its cloud-based counterparts for heavy reasoning, complex coding, or writing. The best you can get out of it are low-complex chat sessions. Plus, setting it up with local inference apps isn"t straightforward, and running an AI model locally on your phone will absolutely chew through your battery life.

That"s why experiments like this are more suitable for enthusiasts, who may want to try this for the sake of it, rather than for regular users. However, with the rate at which the technology is advancing, it would be reasonable to expect that even AI models that can run on smartphone hardware will become more capable and useful for complex tasks.

Tags