name	runanywhere-ai
description	Integrate RunAnywhere on-device AI (LLMs, STT, TTS, voice agents, VLM) into applications. Use when implementing offline AI features, local LLM inference, on-device speech processing, privacy-first AI, vision language models, or when mentions of "RunanywhereAI", "on-device AI", "local inference", "offline AI", "GGUF models", "llama.cpp", "VLM", or "on-device vision" appear. Supports Swift (iOS/macOS), Kotlin (Android), Web (WebAssembly), React Native, and Flutter platforms.

name

runanywhere-ai

description

Integrate RunAnywhere on-device AI (LLMs, STT, TTS, voice agents, VLM) into applications. Use when implementing offline AI features, local LLM inference, on-device speech processing, privacy-first AI, vision language models, or when mentions of "RunanywhereAI", "on-device AI", "local inference", "offline AI", "GGUF models", "llama.cpp", "VLM", or "on-device vision" appear. Supports Swift (iOS/macOS), Kotlin (Android), Web (WebAssembly), React Native, and Flutter platforms.

RunAnywhere AI Integration

Comprehensive guide for integrating RunAnywhere on-device AI into your applications across all supported platforms.

What is RunAnywhere?

RunAnywhere enables privacy-first, on-device AI inference for:

LLM Text Generation - Run LFM2, Llama, Mistral, Qwen, SmolLM locally via llama.cpp
Vision Language Models (VLM) - On-device visual understanding with camera/image input (iOS/Web)
Speech-to-Text - Whisper-based transcription
Text-to-Speech - Neural voice synthesis via Piper
Voice Agent Pipeline - Complete VAD → STT → LLM → TTS orchestration
Tool Calling & Structured Output - Function calling and JSON schema-guided generation

All processing happens locally—no cloud, no latency, no data leaves the device.

Workflow

1. Choose Your Platform

Select the platform-specific guide:

Swift (iOS/macOS) → Read swift.md
Kotlin (Android) → Read kotlin.md
Web (Browser) → Read web.md
React Native → Read react-native.md
Flutter → Read flutter.md

Each guide contains complete installation, setup, and usage instructions.

2. Select Models

Choose appropriate models for your use case:

Read models.md for:

Model size vs quality tradeoffs
Device-specific recommendations
Quantization level guidance
Download URLs

Quick recommendations:

Lightweight (< 1GB RAM): LFM2 350M, SmolLM2 360M, Qwen 0.5B
Balanced (1-4GB RAM): LFM2 1.2B Tool, Llama 3.2 1B
High Quality (4GB+ RAM): Llama 3.2 3B, Mistral 7B
Vision (VLM): LFM2-VL 450M (iOS/Web only)

3. Core Integration Pattern

All platforms follow the same three-step pattern:

1. Initialize SDK
   ↓
2. Download & Load Model
   ↓
3. Generate / Transcribe / Synthesize

Platform-specific implementation details are in each reference guide.

Common Integration Tasks

Integrate LLM into Swift iOS App

Read swift.md
Add RunAnywhere via Swift Package Manager
Initialize SDK and register LlamaCPP module
Download and load a model (e.g., smollm2-360m)
Use RunAnywhere.chat() or RunAnywhere.generate() for inference
Optional: Use streaming for better UX

Set Up Speech-to-Text in React Native

Read react-native.md
Install @runanywhere/core and @runanywhere/onnx
Register ONNX module and add Whisper model
Download and load STT model
Use RunAnywhere.transcribe() for audio transcription

Build Voice Assistant on Android

Read kotlin.md
Add RunAnywhere Kotlin SDK dependencies
Register LlamaCPP and ONNX modules
Download LLM, STT, and TTS models
Configure and start VoiceAgent with RunAnywhere.startVoiceSession()
Handle voice session events (listening, transcribed, responded, speaking)

Deploy On-Device LLM to Web

Read web.md
Install @runanywhere/web, @runanywhere/web-llamacpp, and @runanywhere/web-onnx via npm
Configure bundler (Vite/Webpack) for WASM files
Set Cross-Origin headers for SharedArrayBuffer
Register LlamaCPP and ONNX backends
Register models with RunAnywhere.registerModels()
Generate with streaming via TextGeneration.generateStream()

Add Vision Language Model (VLM) to Web App

Read web.md
Install 3-package Web SDK (@runanywhere/web, @runanywhere/web-llamacpp, @runanywhere/web-onnx)
Create a VLM Web Worker with startVLMWorkerRuntime()
Wire VLMWorkerBridge to RunAnywhere.setVLMLoader()
Use VideoCapture to capture camera frames
Process frames with VLMWorkerBridge.shared.process(rgbPixels, width, height, prompt)

Key Concepts

Quantization

Models are compressed using quantization:

Q4_0: Smallest size, fastest, lower quality (~3.5 bits/weight)
Q5_K_M: Balanced size and quality (~5.5 bits/weight)
Q8_0: Largest size, best quality, slower (~8 bits/weight)

Model Formats

LLM: GGUF format (via llama.cpp) — LLMFramework.LlamaCpp
VLM: GGUF format (model + mmproj files) — LLMFramework.LlamaCpp
STT: ONNX format (Whisper models) — LLMFramework.ONNX
TTS: ONNX format (Piper voices) — LLMFramework.ONNX
VAD: ONNX format (Silero VAD v5) — LLMFramework.ONNX

Memory Requirements

Rule of thumb: Device RAM should be 2× model size

SmolLM2 360M (~400MB) → Need 800MB+ RAM
Llama 3.2 1B (~1GB) → Need 2GB+ RAM
Mistral 7B (~4GB) → Need 8GB+ RAM

Error Handling Best Practices

Common Errors Across All Platforms

Model Not Found

Solution: Download model first before loading
Check: Model exists at expected path

Insufficient Memory

Solution: Use smaller model or lower quantization
Check: Available device RAM vs model requirements

Slow Generation

Solution: Use streaming for better UX, lower quantization, or smaller model
Check: Tokens per second metric (target: 10+ tok/s)

Download Fails

Solution: Implement retry with exponential backoff
Check: Network connectivity, storage space

Performance Tips

Optimize Model Selection

Web: Use Q4_0 quantization (browser memory limits)
Mobile: Use Q4_K_M or Q5_K_M (balanced)
Desktop: Use Q5_K_M or Q8_0 (best quality)

Improve UX

Use streaming to display tokens as they generate
Show progress bars during model downloads
Display generation metrics (tok/s) to users
Implement cancellation for long-running operations

Memory Management

Unload models when not in use
Check available memory before loading
Use smaller models on low-memory devices
Monitor memory usage during inference

Debugging

Enable Verbose Logging

All platforms support development mode with verbose logging:

Swift:

try RunAnywhere.initialize(environment: .development)

Kotlin:

RunAnywhere.initialize(environment = SDKEnvironment.DEVELOPMENT)

Web:

await RunAnywhere.initialize({ environment: 'development', debug: true })

React Native/Flutter: Same pattern as above

Platform-Specific Logs

iOS:

log stream --predicate 'subsystem CONTAINS "com.runanywhere"' --info --debug

Android:

adb logcat | grep "RunAnywhere"

Web:

console.log()  // Standard browser console

Additional Resources

Official Documentation

Website: runanywhere.ai
Docs: docs.runanywhere.ai
GitHub: github.com/RunanywhereAI/runanywhere-sdks
Discord: discord.gg/N359FBbDVd

Example Apps

Each platform has a complete demo app:

iOS: examples/ios/RunAnywhereAI/
Android: examples/android/RunAnywhereAI/
Web: web-starter-app (Chat, Vision, Voice tabs)
React Native: examples/react-native/RunAnywhereAI/
Flutter: examples/flutter/RunAnywhereAI/

Quick Reference

Task	Command/Method	Platform
Initialize SDK	`RunAnywhere.initialize()`	All
Register Backend	`LlamaCPP.register()` / `ONNX.register()`	Web
Register Models	`RunAnywhere.registerModels(models)`	Web
Download Model	`ModelManager.downloadModel(id)`	Web
Load Model	`ModelManager.loadModel(id)`	Web
Generate Text	`TextGeneration.generate(prompt)`	All
Stream Generation	`TextGeneration.generateStream(prompt)`	All
VLM Process	`VLMWorkerBridge.shared.process(rgb, w, h, prompt)`	iOS/Web
Capture Camera	`VideoCapture.captureFrame(dim)`	Web
Transcribe Audio	`STT.transcribe(audio)`	All
Synthesize Speech	`TTS.synthesize(text)`	All
Voice Agent	`VoicePipeline.start()`	All

Next Steps:

Choose your platform and read the corresponding reference guide
Review models.md to select appropriate models
Follow the platform-specific setup instructions
Start with a simple text generation example
Explore advanced features (STT, TTS, voice agents) as needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RunAnywhere AI Integration

What is RunAnywhere?

Workflow

1. Choose Your Platform

2. Select Models

3. Core Integration Pattern

Common Integration Tasks

Integrate LLM into Swift iOS App

Set Up Speech-to-Text in React Native

Build Voice Assistant on Android

Deploy On-Device LLM to Web

Add Vision Language Model (VLM) to Web App

Key Concepts

Quantization

Model Formats

Memory Requirements

Error Handling Best Practices

Common Errors Across All Platforms

Performance Tips

Optimize Model Selection

Improve UX

Memory Management

Debugging

Enable Verbose Logging

Platform-Specific Logs

Additional Resources

Official Documentation

Example Apps

Quick Reference

FilesExpand file tree

SKILL.md

Latest commit

History

SKILL.md

File metadata and controls

RunAnywhere AI Integration

What is RunAnywhere?

Workflow

1. Choose Your Platform

2. Select Models

3. Core Integration Pattern

Common Integration Tasks

Integrate LLM into Swift iOS App

Set Up Speech-to-Text in React Native

Build Voice Assistant on Android

Deploy On-Device LLM to Web

Add Vision Language Model (VLM) to Web App

Key Concepts

Quantization

Model Formats

Memory Requirements

Error Handling Best Practices

Common Errors Across All Platforms

Performance Tips

Optimize Model Selection

Improve UX

Memory Management

Debugging

Enable Verbose Logging

Platform-Specific Logs

Additional Resources

Official Documentation

Example Apps

Quick Reference