Google Gemma 3n Impact Challenge Submission
"Think bigger than a simple chatbot" - Financial empowerment through private, on-device receipt intelligence.
In today's economy, tracking expenses is crucial. Yet, our most frequent purchasesβfrom supermarkets, cafes, and local shopsβoften end up as paper receipts, creating a black hole in our financial data. With ~60% of Americans lacking a formal budget and 18% of all payments still made in cash, a huge volume of financial data is lost. How can people control their spending if they can't track it?
Checkstand's Answer: Empower users with a tool that digitizes physical receipts, providing crucial financial insights that were previously lost.
- π° Financial Empowerment: Finally track spending from supermarkets and local shops that don't offer digital records.
- π Educational Insights: Understand where your money goes, identify spending patterns, and take control of your budget.
- π Uncompromising Privacy: Your financial data is sensitive. Checkstand processes everything on-device, so your information never leaves your phone. It's your data, your insights, your control.
- π§ Multimodal AI: A powerful on-device pipeline combining Camera, OCR, and Gemma 3n to understand any receipt.
- π± Offline-First: Works anywhere, anytime, without needing an internet connection.
- β‘ Real-Time: Process receipts in ~24 seconds on consumer hardware
- π― Practical Impact: Solves real financial tracking problems
- On-Device Inference: 4.4GB E4B model running locally via MediaPipe
- Session Management: Fresh inference sessions per request (prevents context contamination)
- Multimodal Pipeline: Seamless integration of image and text processing
- Clean Architecture: Domain/Data/UI separation with Hilt dependency injection
- Robust Parsing: Smart fallback mechanisms when LLM output varies
- Camera Integration: CameraX with real-time preview and capture
- Modern Android: Jetpack Compose UI with Material 3 design
Development Journey Note: We initially explored React Native for cross-platform efficiency, but encountered Metro bundler's 2GB file size limitation with our 4.4GB Gemma 3n model. While we successfully created a custom native module workaround, we ultimately chose pure Android native for optimal performance and simplified architecture. This decision exemplifies choosing the right tool for AI deployment constraints.
- Material Design Icons: Professional vector drawables replace emoji icons
- Clean UX: Removed confusing processing notifications
- Background Processing: Queue system with timeout handling
- Visual Status Indicators: Clear receipt processing states
- Android Studio Hedgehog or newer
- Android device with API level 24+ (Android 7.0)
- 4GB+ RAM (recommended for optimal model performance)
git clone https://github.com/gryphon2411/Checkstand.git
cd Checkstand
./gradlew assembleDebug- Enable "Unknown Sources" in Android settings
- Install the generated APK:
app/build/outputs/apk/debug/app-debug.apk - Grant camera permission when prompted
- Start scanning receipts!
- Image Capture: User scans receipt with camera or selects from gallery
- OCR Processing: Google ML Kit extracts text from receipt image
- AI Analysis: Gemma 3n processes text to identify merchant, date, total
- Smart Parsing: Robust extraction with intelligent fallbacks
- Local Storage: All data stays on device in local database
Checkstand addresses a massive, underserved market by providing a private, educational tool for financial empowerment. (View Data Sources)
- Budget-Conscious Households (~79M in U.S.): A simple, effective tool for the 60% of U.S. households that do not have a formal budget.
- Freelancers & Gig Workers (~64M in U.S.): Meticulous, offline-first expense tracking for the 38% of the workforce that needs it for tax purposes and managing variable income.
- The Cash-Reliant Population (~36M in U.S.): Serves the ~14% of U.S. adults who use cash for most purchases and are excluded from mainstream digital finance apps.
- Privacy-Aware Users: In a market where ~79% of consumers are concerned about data privacy, Checkstand's on-device AI is a critical differentiator that builds essential trust.
Google AI Edge Prize Target: "Most compelling and effective use case built using Google AI Edge implementation of Gemma 3n"
β
Compelling Use Case: Universal need for receipt management
β
Effective Implementation: Production-ready Android application
β
Google AI Edge: MediaPipe framework with Gemma 3n model
β
Real-World Impact: Solves genuine user problems with privacy focus
app/src/main/java/com/checkstand/
βββ domain/ # Business logic and use cases
βββ data/ # Repositories and data sources
βββ service/ # LLM, OCR, and Camera services
βββ ui/ # Jetpack Compose screens and components
βββ utils/ # Helper utilities and extensions
- Session Management: Prevents LLM context contamination between receipts
- Fallback Parsing: Regex-based extraction when structured parsing fails
- Multimodal Workflow: Optimized text-before-image prompt ordering
- Error Resilience: Graceful degradation with user-friendly error handling
- Model Loading: ~1-2 seconds on modern devices
- Receipt Processing: ~24 seconds average (includes OCR + LLM)
- Memory Usage: Efficient with 4.4GB model size
- Accuracy: Robust extraction across various receipt formats
- Language: Kotlin
- UI: Jetpack Compose
- Architecture: MVVM
- AI Framework: MediaPipe LLM Inference
- Model: Gemma 3n E4B (int4 quantized)
- Android API 26+ (Android 8.0)
- ~4.4GB free storage on device for model
- ADB access to push model to device
- Gemma-3n E4B model file (gemma-3n-E4B-it-int4.task)
-
Clone the repository
git clone https://github.com/gryphon2411/Checkstand.git cd Checkstand -
Open in Android Studio
- Open the project in Android Studio
- Wait for Gradle sync to complete
-
Deploy the Model
A helper script is provided to deploy the model to your connected device.
Important: Place your
gemma-3n-E4B-it-int4.taskfile in the project's root directory before running the script.# Make the script executable (if needed) chmod +x deploy-model.sh # Run the deployment script ./deploy-model.sh
The script will verify your
adbconnection, find the model file, and push it to the correct location on your device. -
Build and Run
- Build the project
- Run on device (API 26+)
- The app should detect the model automatically
- Launch the app
- Complete initial setup (model should be detected if pushed correctly)
- Start scanning receipts with the camera or gallery
- Enjoy privacy-focused AI - all processing happens on your device
- Model: Gemma-3n E4B Instruct (int4 quantized)
- Size: ~4.4GB
- Optimization: GPU acceleration where available
- Provider: Google
- Supports: Multimodal (text + image) prompting
- β All inference runs on-device
- β No data sent to external servers
- β Receipt data stored locally only
- β No telemetry or analytics
- β No internet required after setup
This app demonstrates:
- Integration with MediaPipe LLM inference for receipt analysis
- Efficient model management with on-device AI
- Clean Android architecture patterns
- Privacy-preserving multimodal AI implementation
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
CC BY 4.0 License - see LICENSE
- Google MediaPipe team for the LLM inference framework
- Google for the Gemma model
- Android and Jetpack Compose teams
