The Caching Problem

When you scan your first grocery receipt, Eatomate has no prior knowledge of your shopping habits. Every item requires manual matching to the nutrition database. But by week 8, the system auto-matches 100% of items—even if you switch stores or brands.

The Goal

Learn your shopping patterns so efficiently that scanning a receipt becomes nearly effortless. The system should recognize "Tesco Organic Semi-Skimmed Milk 1L" is the same as "Sainsbury's Organic Semi-Skimmed 1L" and auto-match both.

Caching Evolution Timeline

Week 1:100% Manual Matching

Cold start problem. The system has no history of your purchases. Every receipt item requires you to select the correct match from the top suggestions.

Example First Receipt (20 items):

Auto-matched: 0 items (0%)
Manual review: 20 items (100%)
Time: ~60 seconds total

Week 4:~95% Auto-Matched

Your personal fuzzy trie has learned common products. Repeat purchases auto-match instantly. Only new products need review.

Example Receipt (20 items):

Auto-matched: 19 items (~95%) - products you've bought before
Manual review: 1 item (~5%) - new products or brand switches
Time: ~15 seconds total

Week 8:100% Auto-Matched

Network effects activate. The system recognizes "Tesco Organic Milk" and "Sainsbury's Organic Milk" as equivalent, even if you've never bought the Sainsbury's version.

Example Receipt (20 items):

Auto-matched: 20 items (100%) - personal history + network effects
Manual review: 0 items - zero manual scans needed
Time: ~3 seconds total

How the Fuzzy Trie Works

A fuzzy trie (prefix tree) allows fast approximate string matching. When you scan a receipt item like "Organic Semi-Skimmed Milk 1L", the system:

Step 1: Normalize Input

Remove store prefixes, standardize spacing, lowercase

"Tesco Organic Semi-Skimmed Milk 1L"
→ "organic semi-skimmed milk 1l"

Step 2: OCR-Aware Fuzzy Search

Search the trie with tolerance for OCR errors (0→O, 1→I, etc.)

Query: "0rganic semi-skimmed mi1k 1l" (OCR errors)
Match: "organic semi-skimmed milk 1l" (95% confidence)

Step 3: Confidence Threshold

If similarity ≥ 85%, auto-match. Otherwise, flag for manual review with top candidates.

City-Level Privacy

City-level network effects raise an important question: does Eatomate share your shopping data with other users?

Privacy Guarantee

No. Your individual purchase history is never shared. Only anonymized aggregated clusters are used.

How it works: After 50+ users in a city match "Tesco Milk" → "Semi-Skimmed Milk 1L", the system knows this is a valid cluster. But it doesn't know who bought what, or when. The cluster is just: "These two text strings map to the same canonical recipe."

Receipt-to-product mappings are de-identified before contributing to the shared matching intelligence. No user ID, receipt ID, or item combinations are stored — each mapping is an independent observation that cannot be traced to any individual. See our Privacy Policy for details.

Technical Implementation

Data Structures

Fuzzy Trie: Prefix tree with Damerau-Levenshtein distance at each node. Supports efficient approximate string matching with O(k * n) complexity where k is the allowed edit distance.
Cache Storage: Personal cache stored locally on device, city cache synced from server.

Sync Strategy

Your personal cache is stored locally and syncs to the cloud every 24 hours. City-level clusters are downloaded weekly or when you manually trigger "Update Database" in settings.

Why This Matters for Accuracy

The intelligent caching system isn't just about convenience—it directly improves nutrition tracking accuracy:

Caching → Better Physics Model

When receipt matching is 99% automatic, you're more likely to scan receipts consistently. More receipt data = better physics-based reconciliation = higher meal accuracy.

Users who scan receipts weekly reach 95+% meal accuracy by week 4. Users who skip receipt scanning plateau at 75-80% (still better than manual logging, but not research-grade).

Intelligent Caching System