Intelligent Caching System
How Eatomate's intelligent caching reduces manual receipt matching from 100% to 0% by week 8. Each receipt item is a ONE-TIME operation—once matched, it's cached forever.
The Caching Problem
When you scan your first grocery receipt, Eatomate has no prior knowledge of your shopping habits. Every item requires manual matching to the nutrition database. But by week 8, the system auto-matches 100% of items—even if you switch stores or brands.
The Goal
Learn your shopping patterns so efficiently that scanning a receipt becomes nearly effortless. The system should recognize "Tesco Organic Semi-Skimmed Milk 1L" is the same as "Sainsbury's Organic Semi-Skimmed 1L" and auto-match both.
Caching Evolution Timeline
Week 1:100% Manual Matching
Cold start problem. The system has no history of your purchases. Every receipt item requires you to select the correct match from the top suggestions.
Example First Receipt (20 items):
- Auto-matched: 0 items (0%)
- Manual review: 20 items (100%)
- Time: ~60 seconds total
Week 4:~95% Auto-Matched
Your personal fuzzy trie has learned common products. Repeat purchases auto-match instantly. Only new products need review.
Example Receipt (20 items):
- Auto-matched: 19 items (~95%) - products you've bought before
- Manual review: 1 item (~5%) - new products or brand switches
- Time: ~15 seconds total
Week 8:100% Auto-Matched
Network effects activate. The system recognizes "Tesco Organic Milk" and "Sainsbury's Organic Milk" as equivalent, even if you've never bought the Sainsbury's version.
Example Receipt (20 items):
- Auto-matched: 20 items (100%) - personal history + network effects
- Manual review: 0 items - zero manual scans needed
- Time: ~3 seconds total
How the Fuzzy Trie Works
A fuzzy trie (prefix tree) allows fast approximate string matching. When you scan a receipt item like "Organic Semi-Skimmed Milk 1L", the system:
Step 1: Normalize Input
Remove store prefixes, standardize spacing, lowercase
→ "organic semi-skimmed milk 1l"
Step 2: OCR-Aware Fuzzy Search
Search the trie with tolerance for OCR errors (0→O, 1→I, etc.)
Match: "organic semi-skimmed milk 1l" (95% confidence)
Step 3: Confidence Threshold
If similarity ≥ 85%, auto-match. Otherwise, flag for manual review with top candidates.
City-Level Privacy
City-level network effects raise an important question: does Eatomate share your shopping data with other users?
Privacy Guarantee
No. Your individual purchase history is never shared. Only anonymized aggregated clusters are used.
How it works: After 50+ users in a city match "Tesco Milk" → "Semi-Skimmed Milk 1L", the system knows this is a valid cluster. But it doesn't know WHO bought what, or when. The cluster is just: "These two text strings map to the same canonical recipe."
You can opt out of contributing to city-level clusters in Settings → Privacy → Help Improve Eatomate. This disables network effects for you (you'll stay at ~80% auto-match instead of ~96%), but your data won't be used to improve others' experience.
Technical Implementation
Data Structures
- Fuzzy Trie: Prefix tree with Damerau-Levenshtein distance at each node. Supports efficient approximate string matching with O(k * n) complexity where k is the allowed edit distance.
- Cache Storage: Personal cache stored locally on device, city cache synced from server.
Sync Strategy
Your personal cache is stored locally and syncs to the cloud every 24 hours. City-level clusters are downloaded weekly or when you manually trigger "Update Database" in settings.
Why This Matters for Accuracy
The intelligent caching system isn't just about convenience—it directly improves nutrition tracking accuracy:
Caching → Better Physics Model
When receipt matching is 96% automatic, you're more likely to scan receipts consistently. More receipt data = better physics-based reconciliation = higher meal accuracy.
Users who scan receipts weekly reach 95+% meal accuracy by week 4. Users who skip receipt scanning plateau at 75-80% (still better than manual logging, but not research-grade).