Mobile AI Agent

AION

An AI agent that lives on your Android device. Sees your screen, reads notifications, sends SMS, places calls, runs automations. All without a host PC.

πŸ“± Kotlin · Android SDK · OpenRouter · llama.cpp · Room DB · MCP · BM25 πŸ“… 2025–2026 ⚑ Phase 1 complete · Phases 2–6 in development

01 Problem

Mobile AI assistants today β€” Siri, Google Assistant, Bixby β€” are constrained to narrow, pre-defined intents. They cannot adapt to novel tasks, execute multi-step workflows, or integrate deeply with the operating system. Meanwhile, powerful LLM-based agents require cloud infrastructure and a host PC, negating the mobility advantage of a smartphone.

AION was conceived to bridge this gap: an autonomous, persistent AI agent that lives on-device, perceives its environment through screen content and notifications, and acts through system APIs β€” SMS, calls, timers, automations β€” all without tethering to a desktop.

02 Architecture

AION runs as an Android Foreground Service with a persistent agent loop. User input enters through a Jetpack Compose chat UI, flows through the agent loop (context management & intent classification), reaches the dual-engine LLM layer, and then routes to the appropriate skill via BM25 semantic matching.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         AION Agent App              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Chat UI (Jetpack Compose)    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚              β–Ό                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Agent Loop                   β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚Context  β”‚ β”‚Intent       β”‚  β”‚  β”‚
β”‚  β”‚  β”‚Manager  β”‚ β”‚Classifier   β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β”‚         β–Ό           β–Ό         β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚  LLM Engine             β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  Cloud (OpenRouter)     β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  Local (llama.cpp)      β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚              β–Ό                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Skill Router (BM25)          β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚SMS   β”‚β”‚Call  β”‚β”‚Notif.  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚Tool  β”‚β”‚Tool  β”‚β”‚Reader  β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚  β”‚
β”‚  β”‚  β”‚Screen  β”‚β”‚Timer   β”‚       β”‚  β”‚
β”‚  β”‚  β”‚Skill   β”‚β”‚Skill   β”‚       β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚              β–Ό                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Persistence (Room DB)         β”‚  β”‚
β”‚  β”‚  Memory Β· Settings Β· Skills    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The architecture is modular by design: the Agent Loop orchestrates context and intent, the LLM Engine supports both cloud (OpenRouter) and local (llama.cpp) inference, and the Skill Router uses BM25 semantic search to dispatch intent to the correct tool. All state persists via Room DB, and the MCP layer enables structured tool calling without a host PC.

03 Tech Stack

Layer Technology
Language Kotlin
UI Framework Jetpack Compose (state management, navigation)
Cloud LLM OpenRouter API (GPT-4o, Claude, any model)
Local LLM llama.cpp (on-device inference)
Persistence Room DB (conversations, encrypted settings)
Skill Routing BM25 semantic search algorithm
Tool Protocol MCP (Model Context Protocol, server/client)
Background Service Android Foreground Service (persistent agent loop)

04 Key Challenges

Android Lifecycle Management

Keeping the agent alive through Doze mode, background restrictions, and aggressive battery optimization across OEMs required careful use of Foreground Services, wake locks, and periodic alarms. Tested on Nothing Phone 2 and Oppo ColorOS to validate cross-vendor reliability.

On-Device LLM Latency

Running llama.cpp on a phone with acceptable response times demanded aggressive model quantization (Q4_K_M), GPU delegation via Vulkan, and prompt caching. The trade-off between model size and intelligence is an ongoing optimisation.

Skill Routing Accuracy vs. Speed

BM25 semantic routing had to balance classification accuracy against near-real-time response latency. Tuning the tokeniser, stop-word lists, and similarity thresholds was essential to prevent misrouted intents without introducing delay.

MCP on Android

Building an MCP server that runs directly on Android β€” without a host PC β€” required a custom transport layer. The standard HTTP+SSE transport was replaced with an in-process bridge to keep the tool-calling loop local and fast.

⚑ Code Highlight

The agent loop uses Kotlin Flow to stream events through a clean intent-routing pipeline β€” classify, route, emit.

fun processUserMessage(
    conversationId: String,
    userText: String,
): Flow<AgentEvent> = flow {
    emit(AgentEvent.AssistantStarted)

    // Persist user input immediately
    conversationRepository.appendMessage(
        conversationId, role = "user", content = userText,
    )

    // Classify intent β†’ route to handler
    val intent = intentClassifier.classify(userText)
    when (intent) {
        is AgentIntent.Empty  -> emit(AgentEvent.Done(DoneReason.EmptyInput))
        is AgentIntent.Chat    -> streamLlmReply(conversationId, userText).collect { emit(it) }
        is AgentIntent.ToolCall -> handleToolCall(conversationId, intent).collect { emit(it) }
    }
    emit(AgentEvent.Done(reason = DoneReason.Normal))
}.flowOn(Dispatchers.Default)

05 Results

6+ Phases Planned
Cloud+Local Dual LLM Engine
BM25 Skill Routing
Screenshot Screen Awareness (Ph. 3)
Key Insight

The dual-LLM architecture gives users the best of both worlds: cloud models for complex reasoning (GPT-4o, Claude) and local inference for privacy-sensitive or offline tasks. The BM25 skill router seamlessly mediates between intent and execution without the user ever thinking about which model is running.

06 What I Learned

Lifecycle Mastery

Building a persistent agent on Android is fundamentally a battle against the OS's battery-saving mechanisms. Foreground Service + proper lifecycle-aware architecture is non-negotiable. Every OEM has quirks β€” test early, test often.

Local LLM Practicality

Running LLMs on-device is feasible today with the right quantisation strategy and GPU acceleration. The latency gap between cloud and local is shrinking, and for many tasks (classification, simple Q&A), local models are already good enough β€” with the added benefit of zero data leaving the device.

Modularity Matters

The skill-router pattern (BM25 + MCP) proved extremely flexible. Adding a new tool is as simple as writing a Kotlin class, registering it in the skill index, and letting the router handle dispatch. This pattern should generalise well to any on-device agent framework.