AION
An AI agent that lives on your Android device. Sees your screen, reads notifications, sends SMS, places calls, runs automations. All without a host PC.
01 Problem
Mobile AI assistants today β Siri, Google Assistant, Bixby β are constrained to narrow, pre-defined intents. They cannot adapt to novel tasks, execute multi-step workflows, or integrate deeply with the operating system. Meanwhile, powerful LLM-based agents require cloud infrastructure and a host PC, negating the mobility advantage of a smartphone.
AION was conceived to bridge this gap: an autonomous, persistent AI agent that lives on-device, perceives its environment through screen content and notifications, and acts through system APIs β SMS, calls, timers, automations β all without tethering to a desktop.
02 Architecture
AION runs as an Android Foreground Service with a persistent agent loop. User input enters through a Jetpack Compose chat UI, flows through the agent loop (context management & intent classification), reaches the dual-engine LLM layer, and then routes to the appropriate skill via BM25 semantic matching.
βββββββββββββββββββββββββββββββββββββββ β AION Agent App β β βββββββββββββββββββββββββββββββββ β β β Chat UI (Jetpack Compose) β β β βββββββββββββ¬ββββββββββββββββββββ β β βΌ β β βββββββββββββββββββββββββββββββββ β β β Agent Loop β β β β βββββββββββ βββββββββββββββ β β β β βContext β βIntent β β β β β βManager β βClassifier β β β β β βββββββββββ ββββββββ¬βββββββ β β β β βΌ βΌ β β β β βββββββββββββββββββββββββββ β β β β β LLM Engine β β β β β β Cloud (OpenRouter) β β β β β β Local (llama.cpp) β β β β β βββββββββββββββββββββββββββ β β β βββββββββββββββββββββββββββββββββ β β βΌ β β βββββββββββββββββββββββββββββββββ β β β Skill Router (BM25) β β β β ββββββββββββββββββββββββββ β β β β βSMS ββCall ββNotif. β β β β β βTool ββTool ββReader β β β β β ββββββββββββββββββββββββββ β β β β ββββββββββββββββββββ β β β β βScreen ββTimer β β β β β βSkill ββSkill β β β β β ββββββββββββββββββββ β β β βββββββββββββββββββββββββββββββββ β β βΌ β β βββββββββββββββββββββββββββββββββ β β β Persistence (Room DB) β β β β Memory Β· Settings Β· Skills β β β βββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββ
The architecture is modular by design: the Agent Loop orchestrates context and intent, the LLM Engine supports both cloud (OpenRouter) and local (llama.cpp) inference, and the Skill Router uses BM25 semantic search to dispatch intent to the correct tool. All state persists via Room DB, and the MCP layer enables structured tool calling without a host PC.
03 Tech Stack
| Layer | Technology |
|---|---|
| Language | Kotlin |
| UI Framework | Jetpack Compose (state management, navigation) |
| Cloud LLM | OpenRouter API (GPT-4o, Claude, any model) |
| Local LLM | llama.cpp (on-device inference) |
| Persistence | Room DB (conversations, encrypted settings) |
| Skill Routing | BM25 semantic search algorithm |
| Tool Protocol | MCP (Model Context Protocol, server/client) |
| Background Service | Android Foreground Service (persistent agent loop) |
04 Key Challenges
Android Lifecycle Management
Keeping the agent alive through Doze mode, background restrictions, and aggressive battery optimization across OEMs required careful use of Foreground Services, wake locks, and periodic alarms. Tested on Nothing Phone 2 and Oppo ColorOS to validate cross-vendor reliability.
On-Device LLM Latency
Running llama.cpp on a phone with acceptable response times demanded aggressive model quantization (Q4_K_M), GPU delegation via Vulkan, and prompt caching. The trade-off between model size and intelligence is an ongoing optimisation.
Skill Routing Accuracy vs. Speed
BM25 semantic routing had to balance classification accuracy against near-real-time response latency. Tuning the tokeniser, stop-word lists, and similarity thresholds was essential to prevent misrouted intents without introducing delay.
MCP on Android
Building an MCP server that runs directly on Android β without a host PC β required a custom transport layer. The standard HTTP+SSE transport was replaced with an in-process bridge to keep the tool-calling loop local and fast.
β‘ Code Highlight
The agent loop uses Kotlin Flow to stream events through a clean intent-routing pipeline β classify, route, emit.
fun processUserMessage( conversationId: String, userText: String, ): Flow<AgentEvent> = flow { emit(AgentEvent.AssistantStarted) // Persist user input immediately conversationRepository.appendMessage( conversationId, role = "user", content = userText, ) // Classify intent β route to handler val intent = intentClassifier.classify(userText) when (intent) { is AgentIntent.Empty -> emit(AgentEvent.Done(DoneReason.EmptyInput)) is AgentIntent.Chat -> streamLlmReply(conversationId, userText).collect { emit(it) } is AgentIntent.ToolCall -> handleToolCall(conversationId, intent).collect { emit(it) } } emit(AgentEvent.Done(reason = DoneReason.Normal)) }.flowOn(Dispatchers.Default)
05 Results
- Phase 1 stable on Nothing Phone 2 with production-grade reliability
- Cloud LLM response time consistently under 2 seconds
- SMS tool with full send / receive capability via Android Telephony APIs
- Encrypted API key storage using Android Keystore + Room DB encryption
- 40+ Kotlin source files organised into a modular, testable architecture
The dual-LLM architecture gives users the best of both worlds: cloud models for complex reasoning (GPT-4o, Claude) and local inference for privacy-sensitive or offline tasks. The BM25 skill router seamlessly mediates between intent and execution without the user ever thinking about which model is running.
06 What I Learned
Building a persistent agent on Android is fundamentally a battle against the OS's battery-saving mechanisms. Foreground Service + proper lifecycle-aware architecture is non-negotiable. Every OEM has quirks β test early, test often.
Running LLMs on-device is feasible today with the right quantisation strategy and GPU acceleration. The latency gap between cloud and local is shrinking, and for many tasks (classification, simple Q&A), local models are already good enough β with the added benefit of zero data leaving the device.
The skill-router pattern (BM25 + MCP) proved extremely flexible. Adding a new tool is as simple as writing a Kotlin class, registering it in the skill index, and letting the router handle dispatch. This pattern should generalise well to any on-device agent framework.