Orbit -Intelligent Mobility AI

Built with Google Gemma 4 E2B · On-Device · 100% Offline · Zero Cloud

A fully offline AI mobility assistant for blind and visually impaired users — powered entirely by Gemma 4 E2B running on-device via llama.rn.

Hackathon Submission - Gemma 4

Orbit demonstrates the full potential of Google Gemma 4 E2B as a real-time, multimodal, multilingual AI assistant running entirely on a mobile phone — no server, no API, no cloud. Every inference — text, vision, and intent classification — is powered by a single Gemma 4 model.

Why Gemma 4?

Capability Used	How Orbit Uses It
Multimodal Vision	Camera images are analyzed on-device via Gemma 4's vision projector (`mmproj`) for obstacle detection, object identification, and text reading
Multilingual Generation	Orbit responds in 17 languages using Gemma 4's native multilingual abilities — no translation API needed
Intent Classification	Gemma 4 classifies ambiguous user queries into 5 intent categories directly via prompt engineering
Conversational AI	General Q&A, contextual follow-ups, and proactive clarifications — all Gemma 4 on-device
Safety-Critical Reasoning	Gemma 4 fuses sensor context (motion, direction, location) with vision to produce instant safety decisions
Quantized Efficiency	Runs as Q4_K_M GGUF (~3.3GB) with GPU offloading via `llama.rn`, enabling real-time inference on mobile hardware

Every AI feature in Orbit is Gemma 4. There is no secondary model, no cloud fallback, and no external AI service.

Overview

Orbit is an AI-powered mobility assistant built with React Native (Expo SDK 54). It combines Gemma 4's multimodal capabilities with on-device speech recognition, text-to-speech, and phone sensor fusion (GPS, compass, accelerometer) into a single hands-free experience.

The entire app — from onboarding to daily navigation — can be operated using only voice. A visually impaired user never needs to touch the screen.

Core Principles

100% Offline — All AI inference runs locally on the device. No data ever leaves the phone.
Single Model — One Gemma 4 E2B instance handles text, vision, classification, and multilingual output.
Fully Hands-Free — Global "Hey Orbit" wake word across every screen.
Real-Time — Sub-second safety decisions using sensor fusion + Gemma 4 vision.

Key Features

Hands-Free Voice Control — "Hey Orbit"

A unified wake word detection system is active across every screen in the app, enabling a completely hands-free experience from first launch to daily use.

Screen	Wake Word Action
Onboarding	Say "Hey Orbit, go next" to advance through setup steps
Download	Say "Hey Orbit, start download" to begin, or "Hey Orbit, continue" when finished
Home	Say "Hey Orbit" to activate the mic for questions or commands
Camera	Say "Hey Orbit" to instantly trigger image capture

Wake word variants handled: orbit, orbed, audit, corbett, order, orb, हे ऑर्बिट, ओर्बिट, and more — robust against STT misrecognition.
Self-restarting wake word loop with idle detection ensures Orbit is always listening when the system is not busy.

5-Class Intent Classification (Gemma 4)

Every voice input is classified by Gemma 4 into one of five intents before processing:

Intent	Trigger Examples	Action
`VISION_REQUIRED`	"Is it safe to walk?", "Anything ahead?"	Opens camera → Gemma 4 Mobility Protocol
`VISION_OPTIONAL`	"What is this?", "Read the label"	Opens camera → Gemma 4 Description Protocol
`NON_VISION`	"Tell me a joke", "Who are you?"	Gemma 4 General Assistant Protocol (no camera)
`LANGUAGE_SWITCH`	"Speak in Hindi", "Switch to Spanish"	Gemma 4 extracts language → updates full pipeline
`UNCERTAIN`	"Check this"	Gemma 4 proactive clarification: "Should I open the camera?"

Regex-first fast path for common patterns, with Gemma 4 LLM fallback for ambiguous queries.
Follow-up detection: "What about now?" inherits the previous intent context.
Short queries (≤4 words) are fast-tracked to skip the LLM classification step.

Multimodal Vision — Gemma 4 + Camera

This is where Gemma 4's multimodal architecture shines. The vision projector enables real-time image understanding directly on the phone.

Auto-Capture: Camera opens with a 3-second countdown and captures automatically.
Manual Capture: Tap the shutter button or say "Hey Orbit" to capture instantly.
On-Device Analysis: Captured images are resized to 256×256, compressed to JPEG, and analyzed by Gemma 4 E2B via its multimodal projector — no image ever leaves the device.
Two Analysis Protocols (both powered by Gemma 4):
- Mobility Protocol — terse, safety-critical: "Car ahead. Stop." (max 10 words)
- Description Protocol — detailed object/text identification: "Paracetamol 500mg tablet." (max 20 words)
Safety Override: Even during description mode, if Gemma 4 detects a hazard, it switches to mobility format automatically.
Retry Logic: If Gemma 4's first response doesn't match expected format, the model retries once with the same prompt.

Sensor Fusion & Situational Awareness

Orbit fuses phone sensor data and injects it as context into every Gemma 4 prompt, enabling physically-aware AI responses:

Sensor	Data Used	Impact on Gemma 4's Response
GPS	Latitude, longitude, reverse geocode	Location context injected into prompt
Speed	`> 0.5 m/s` = walking, else stopped	Gemma 4 says "Stop" (moving) vs "Wait" (stopped)
Compass	0–360° heading	Gemma 4 gives directional guidance: "Move right", "Slightly left"

Voice Input + Motion + Direction + Location → Intent Engine → Gemma 4 Protocol → Natural Speech Output

Multilingual Support — 17 Languages (Gemma 4 Native)

Orbit leverages Gemma 4's built-in multilingual capabilities — no translation API, no external service. The entire pipeline adapts:

STT recognizes speech in the selected language
Gemma 4 generates responses in the target language's native script (no Latin transliteration)
TTS speaks the output in the matching language

Language	Code	Language	Code
English	`en-US`	Bengali	`bn-IN`
Hindi	`hi-IN`	Tamil	`ta-IN`
Spanish	`es-ES`	Telugu	`te-IN`
French	`fr-FR`	Marathi	`mr-IN`
German	`de-DE`	Gujarati	`gu-IN`
Chinese	`zh-CN`	Portuguese	`pt-BR`
Japanese	`ja-JP`	Italian	`it-IT`
Korean	`ko-KR`	Arabic	`ar-SA`
Russian	`ru-RU`

Language input is normalized (diacritics stripped, BCP-47 codes parsed, fuzzy matched).
Mid-conversation switching: "Speak in Hindi" → Gemma 4 extracts the target language, confirms in that language, and all subsequent output switches.

Advanced TTS Engine

Sentence-level chunking: Long Gemma 4 responses are split at sentence boundaries to prevent the TTS engine from rejecting long strings.
Generation counter: Prevents stale callbacks from previous speak calls from interfering with current speech.
Adaptive safety timeout: Timeout scales with text length (4s minimum, 60s maximum).
Polling fallback: 50 consecutive isSpeakingAsync() === false readings required before declaring speech complete.
Haptic feedback: Vibration on speech start for tactile confirmation.
speakAndWait(): Promise-based API for sequencing mic activation after TTS.

Robust STT Service

App state monitoring: Mic is force-stopped when app goes to background.
Permission caching: Mic permission is checked once and cached for the session.
Language sync: STT language code is synced with the user's profile language.
Clean session management: Previous sessions are always aborted before starting new ones, with native engine release delays.

Personalized Onboarding (5-Step Voice Setup)

Step	Question	How It Customizes Gemma 4
1	"How would you describe your vision?"	Injected into Gemma 4's context as user profile
2	"Which language do you prefer?"	Sets Gemma 4's output language + STT/TTS language
3	"Where do you spend most of your time?"	Gives Gemma 4 environmental context
4	"What tasks do you need the most help with?"	Prioritizes Gemma 4's response focus
5	"How do you like Orbit to respond?"	Shapes Gemma 4's tone and verbosity

Each question is read aloud via TTS.
Answers can be typed or spoken (tap mic or say "Hey Orbit").
Voice navigation: "Hey Orbit, go next" / "continue".
Validation: Empty answers trigger spoken error feedback.
Profile is persisted locally via AsyncStorage.

One-Time Model Download

Gemma 4 E2B runs entirely on-device, but the model weights need to be downloaded once:

Two files: Main model (~3.3GB, Q4_K_M quantized) + Vision Projector (~200MB, f16)
Progress tracking: Real-time MB counter with gradient progress bar
Resume detection: Checks existing files on mount — skips completed downloads
Integrity validation: Model must exceed size threshold to be considered complete
Voice-controlled: Say "Hey Orbit, start download" to begin

After download, Orbit never needs an internet connection again.

Architecture

The Assistive Intelligence Loop

┌──────────────┐     ┌────────────────┐     ┌───────────────────┐
│  Voice Input │────▶│ Intent Engine  │────▶│ Protocol Selection│
│  (On-Device  │     │ (Regex + Gemma │     │                   │
│   STT)       │     │  4 Fallback)   │     └───────────────────┘
└──────────────┘     └────────────────┘               │
                                                      │
                     ┌────────────────┐               │
                     │ Sensor Fusion  │───────────────┤
                     │ GPS + Compass  │               │
                     │ + Motion       │               ▼
                     └────────────────┘     ┌───────────────────┐
                                            │   Gemma 4 E2B     │
                     ┌────────────────┐     │   (On-Device)     │
                     │ Camera/Vision  │────▶│   + mmproj Vision │
                     │ (Optional)     │     │   Projector       │
                     └────────────────┘     └───────────────────┘
                                                      │
                                                      ▼
                                            ┌───────────────────┐
                                            │ TTS Output        │
                                            │ (Natural Speech)  │
                                            └───────────────────┘

Three Gemma 4 Protocols

Protocol	Use Case	Max Words	Gemma 4 Prompt Format
Mobility	Walking, obstacles, safety	10	`"<hazard> <location>. <action>."`
Description	Reading, identifying, describing	20	Natural language answer to user's query
General	Conversation, info, follow-ups	25	Concise conversational response

Real-World Interaction Examples

Scenario	User Input	Gemma 4's Action	Speech Output
Walking toward car	"Is it safe?"	Vision + Mobility Protocol	"Car ahead. Stop."
Standing still, obstacle	"Anything ahead?"	Sensor-aware (stopped)	"Obstacle ahead. Wait."
Ambiguous request	"Check"	Proactive clarification	"Should I open the camera?"
Follow-up	"What about now?"	Inherits previous intent	"Path clear. Walk forward."
Reading medicine	"Read this"	Vision + Description Protocol	"Paracetamol 500mg tablet."
Language switch	"Speak in Hindi"	Extracts + confirms in Hindi	"भाषा हिंदी में बदल दी गई।"

Gemma 4 Integration Details

Model Configuration

// LLM initialization (HomeScreen.tsx)
const llamaContext = await initLlama({
  model: modelPath,           // gemma4-e2b-q4km.gguf (~3.3GB)
  use_mlock: false,
  n_ctx: 2048,                // Context window
  n_gpu_layers: 99,           // Maximum GPU offloading
});

// Vision projector initialization
await llamaContext.initMultimodal({
  path: mmprojPath,           // gemma4-e2b-mmproj.gguf (~200MB)
  image_max_tokens: 256,      // Vision token budget
});

Inference Parameters

Parameter	Value	Rationale
`n_predict`	100–150	Short, actionable responses for safety
`temperature`	0.1–0.2	Low creativity for deterministic safety outputs
`top_p`	0.8	Focused token sampling
`stop`	`<end_of_turn>`, `<eos>`	Gemma 4 chat template stop tokens

Prompt Template (Gemma 4 Chat Format)

<start_of_turn>user
[PROTOCOL INSTRUCTIONS]

Context: [sensor data + user profile]
User: [voice input]
[LANGUAGE INSTRUCTION]
Follow protocol strictly.<end_of_turn>
<start_of_turn>model

Vision Prompt Template

<start_of_turn>user
<__media__>
[PROTOCOL INSTRUCTIONS]

Context: [sensor data]
User request: [analysis prompt]
[LANGUAGE INSTRUCTION]
Follow protocol strictly.<end_of_turn>
<start_of_turn>model

Tech Stack

Category	Technology	Details
AI Model	Google Gemma 4 E2B	Q4_K_M quantized GGUF + f16 vision projector
AI Runtime	`llama.rn` v0.12.0-rc.8	On-device GGUF execution with GPU offloading
Framework	React Native (Expo SDK 54)	New Architecture enabled, TypeScript 5.9
Speech-to-Text	`expo-speech-recognition` v3.1.2	On-device STT with continuous mode
Text-to-Speech	`expo-speech` v14.0.8	Platform-native TTS engine
Camera	`expo-camera` v17.0.10	Photo capture with auto/manual trigger
Image Processing	`expo-image-manipulator` v14.0.8	Resize + compress before Gemma 4 analysis
Location	`expo-location` v19.0.8	GPS, speed, heading, reverse geocode
Storage	`@react-native-async-storage` v2.2.0	User profile persistence (fully local)
Navigation	`@react-navigation/native-stack` v7	Static navigation with transitions
UI	`expo-linear-gradient`, `@expo/vector-icons`	Gradient UI elements, icon library

Folder Structure

orbit/
│
├── App.tsx                          # Root navigator — Boot → Onboarding → Download → Home → Camera → Settings
├── index.ts                         # Expo entry point — registers App as root component
├── app.json                         # Expo config — permissions, plugins, splash screen
├── package.json                     # Dependencies and scripts
├── tsconfig.json                    # TypeScript configuration
│
├── assets/                          # Static assets
│   ├── icon.png                     #   App icon
│   ├── adaptive-icon.png            #   Android adaptive icon
│   ├── splash-icon.png              #   Splash screen icon
│   ├── logo.png                     #   In-app header logo
│   └── favicon.png                  #   Web favicon
│
├── database/                        # Local data persistence
│   └── db.ts                        #   AsyncStorage wrapper — user profile CRUD
│                                    #   Defines UserProfile: visionDescription, language,
│                                    #   locationContext, helpNeeded, responseStyle
│
├── scripts/                         # Build-time utilities
│   └── postinstall-fixes.js         #   Patches expo-speech-recognition tsconfig paths
│
└── src/                             # Application source code
    │
    ├── constants/                   # Configuration and Gemma 4 prompt templates
    │   ├── prompts.ts               #   Three Gemma 4 protocols:
    │   │                            #     • ORBIT_MOBILITY_PROTOCOL (safety, max 10 words)
    │   │                            #     • ASSISTIVE_DESCRIPTION_PROTOCOL (vision, max 20 words)
    │   │                            #     • GENERAL_ASSISTANT_PROTOCOL (chat, max 25 words)
    │   │                            #   + INTENT_CLASSIFICATION_PROMPT (5-class)
    │   │                            #   + LANGUAGE_SWITCH_CONFIRMATION_PROMPT
    │   │
    │   ├── languages.ts             #   17-language registry — regex pattern matching,
    │   │                            #   BCP-47 code resolution, fuzzy input normalization
    │   │
    │   └── voice.ts                 #   TTS config (rate: 0.85, pitch: 1.3, volume: 1.0)
    │
    ├── hooks/                       # React hooks — bridge between services and UI
    │   ├── useSTT.ts                #   STT hook — startListening(), startWakeWordDetection(),
    │   │                            #   stopListening(), 23 wake word variants, fail counter
    │   │
    │   └── useTTS.ts                #   TTS hook — speak(), speakAndWait(), stop(),
    │                                #   getIsSpeaking(), auto-init on mount
    │
    ├── screens/                     # UI screens (5 screens)
    │   ├── HomeScreen.tsx           #   Main interface — Gemma 4 init, intent classification,
    │   │                            #   sensor fusion, chat UI, wake word loop, streaming tokens
    │   │
    │   ├── CameraScreen.tsx         #   Camera — auto-capture countdown, manual capture,
    │   │                            #   front/back toggle, wake word capture, crosshair overlay
    │   │
    │   ├── OnboardingScreen.tsx     #   5-step voice setup — progress bar, validation,
    │   │                            #   wake word navigation, live language sync
    │   │
    │   ├── DownloadScreen.tsx       #   Model download — two-phase progress, resume detection,
    │   │                            #   integrity check, voice-controlled flow
    │   │
    │   └── SettingsScreen.tsx       #   Settings — language change with instant TTS/STT sync
    │
    └── services/                    # Platform services
        ├── camera.ts                #   Voice command detection (29 keywords),
        │                            #   prompt extraction ("capture bottle" → "Locate: bottle")
        │
        ├── location.ts              #   GPS provider — lat/lon, speed, heading, reverse geocode
        │
        ├── weather.ts               #   Weather context provider for Gemma 4 prompts
        │
        └── speech/                  #   Speech engine services
            ├── stt.ts               #   Low-level STT — session management, app state monitor,
            │                        #   permission caching, language sync
            │
            └── tts.ts               #   Low-level TTS — sentence chunking, generation counter,
                                     #   adaptive timeout, polling fallback, haptic feedback

App Flow

┌─────────┐     ┌──────────────┐     ┌──────────────┐     ┌────────────┐
│  Boot   │────▶│  Onboarding  │────▶│   Download   │────▶│    Home    │
│         │     │  (5 steps)   │     │ (Gemma 4 DL) │     │  (Gemma 4) │
└─────────┘     └──────────────┘     └──────────────┘     └────────────┘
     │                                                      │         │
     │  (profile + model exist)                             │         │
     └──────────────────────────────────────────────────────┘         │
                                                              ┌──────┴───────┐
                                                              │              │
                                                         ┌────────┐   ┌──────────┐
                                                         │ Camera │   │ Settings │
                                                         │(Gemma 4│   │          │
                                                         │ Vision)│   │          │
                                                         └────────┘   └──────────┘

Boot — Checks profile → model files → integrity → routes accordingly.
Onboarding — 5-step voice/text setup → builds user profile for Gemma 4 context.
Download — One-time Gemma 4 model download (~3.5GB total). After this, fully offline.
Home — Main AI interface. Gemma 4 handles text, vision, classification, and multilingual output.
Camera — Auto/manual capture → image sent to Gemma 4 vision pipeline on-device.
Settings — Language change triggers Gemma 4 confirmation in the new language.

Getting Started

Prerequisites

Node.js 18+
Android device with ~4GB free storage (for Gemma 4 model files)
Expo CLI (npx expo)

Installation

# Clone the repository
git clone <repo-url>
cd orbit

# Install dependencies
npm install

# Build and run on Android device
npx expo run:android

First Launch

Onboarding — Answer 5 personalization questions (voice or text).
Download — Gemma 4 E2B model downloads once (~3.5GB total).
Offline Forever — Orbit greets you and begins listening. Say "Hey Orbit" to start.

Voice Commands

Command	Context	Action
"Hey Orbit"	Any screen	Activates Orbit / captures image (Camera)
"Hey Orbit, go next"	Onboarding	Advances to next question
"Hey Orbit, start download"	Download	Begins model download
"Hey Orbit, continue"	Download (complete)	Navigates to Home
"Is it safe to walk?"	Home	Opens camera → Gemma 4 Mobility analysis
"What is this?"	Home	Opens camera → Gemma 4 Description analysis
"Read the label"	Home	Opens camera → Gemma 4 text recognition
"Speak in Hindi"	Home	Gemma 4 switches all output to Hindi
"Yes" / "Sure"	After uncertain intent	Confirms camera opening

Supported Languages

English · Hindi · Spanish · French · German · Chinese · Japanese · Korean · Portuguese · Italian · Russian · Arabic · Bengali · Tamil · Telugu · Marathi · Gujarati

All 17 languages are powered by Gemma 4's native multilingual generation — no translation service involved.

Built for the Gemma 4 Hackathon
Proving that a single on-device Gemma 4 model can power a complete, safety-critical AI assistant — with zero cloud dependency.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.idea		.idea
assets		assets
database		database
scripts		scripts
src		src
.gitignore		.gitignore
App.tsx		App.tsx
README.md		README.md
app.json		app.json
assistive_vision.md		assistive_vision.md
index.ts		index.ts
manual-camera-trigger.png		manual-camera-trigger.png
modelintegration.md		modelintegration.md
package-lock.json		package-lock.json
package.json		package.json
test.md		test.md
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Orbit -Intelligent Mobility AI

Hackathon Submission - Gemma 4

Why Gemma 4?

Table of Contents

Overview

Core Principles

Key Features

Hands-Free Voice Control — "Hey Orbit"

5-Class Intent Classification (Gemma 4)

Multimodal Vision — Gemma 4 + Camera

Sensor Fusion & Situational Awareness

Multilingual Support — 17 Languages (Gemma 4 Native)

Advanced TTS Engine

Robust STT Service

Personalized Onboarding (5-Step Voice Setup)

One-Time Model Download

Architecture

The Assistive Intelligence Loop

Three Gemma 4 Protocols

Real-World Interaction Examples

Gemma 4 Integration Details

Model Configuration

Inference Parameters

Prompt Template (Gemma 4 Chat Format)

Vision Prompt Template

Tech Stack

Folder Structure

App Flow

Getting Started

Prerequisites

Installation

First Launch

Voice Commands

Supported Languages

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages