Intelligent Caching System

How Eatomate's intelligent caching reduces manual receipt matching from 100% to 0% by week 8. Each receipt item is a ONE-TIME operation—once matched, it's cached forever.

The Caching Problem

When you scan your first grocery receipt, Eatomate has no prior knowledge of your shopping habits. Every item requires manual matching to the nutrition database. But by week 8, the system auto-matches 100% of items—even if you switch stores or brands.

The Goal

Learn your shopping patterns so efficiently that scanning a receipt becomes nearly effortless. The system should recognize "Tesco Organic Semi-Skimmed Milk 1L" is the same as "Sainsbury's Organic Semi-Skimmed 1L" and auto-match both.

Caching Evolution Timeline

Week 1:100% Manual Matching

Cold start problem. The system has no history of your purchases. Every receipt item requires you to select the correct match from the top suggestions.

Example First Receipt (20 items):

  • Auto-matched: 0 items (0%)
  • Manual review: 20 items (100%)
  • Time: ~60 seconds total

Week 4:~95% Auto-Matched

Your personal fuzzy trie has learned common products. Repeat purchases auto-match instantly. Only new products need review.

Example Receipt (20 items):

  • Auto-matched: 19 items (~95%) - products you've bought before
  • Manual review: 1 item (~5%) - new products or brand switches
  • Time: ~15 seconds total

Week 8:100% Auto-Matched

Network effects activate. The system recognizes "Tesco Organic Milk" and "Sainsbury's Organic Milk" as equivalent, even if you've never bought the Sainsbury's version.

Example Receipt (20 items):

  • Auto-matched: 20 items (100%) - personal history + network effects
  • Manual review: 0 items - zero manual scans needed
  • Time: ~3 seconds total

How the Fuzzy Trie Works

A fuzzy trie (prefix tree) allows fast approximate string matching. When you scan a receipt item like "Organic Semi-Skimmed Milk 1L", the system:

Step 1: Normalize Input

Remove store prefixes, standardize spacing, lowercase

"Tesco Organic Semi-Skimmed Milk 1L"
→ "organic semi-skimmed milk 1l"

Step 2: OCR-Aware Fuzzy Search

Search the trie with tolerance for OCR errors (0→O, 1→I, etc.)

Query: "0rganic semi-skimmed mi1k 1l" (OCR errors)
Match: "organic semi-skimmed milk 1l" (95% confidence)

Step 3: Confidence Threshold

If similarity ≥ 85%, auto-match. Otherwise, flag for manual review with top candidates.

City-Level Privacy

City-level network effects raise an important question: does Eatomate share your shopping data with other users?

Privacy Guarantee

No. Your individual purchase history is never shared. Only anonymized aggregated clusters are used.

How it works: After 50+ users in a city match "Tesco Milk" → "Semi-Skimmed Milk 1L", the system knows this is a valid cluster. But it doesn't know WHO bought what, or when. The cluster is just: "These two text strings map to the same canonical recipe."

You can opt out of contributing to city-level clusters in Settings → Privacy → Help Improve Eatomate. This disables network effects for you (you'll stay at ~80% auto-match instead of ~96%), but your data won't be used to improve others' experience.

Technical Implementation

Data Structures

  • Fuzzy Trie: Prefix tree with Damerau-Levenshtein distance at each node. Supports efficient approximate string matching with O(k * n) complexity where k is the allowed edit distance.
  • Cache Storage: Personal cache stored locally on device, city cache synced from server.

Sync Strategy

Your personal cache is stored locally and syncs to the cloud every 24 hours. City-level clusters are downloaded weekly or when you manually trigger "Update Database" in settings.

Why This Matters for Accuracy

The intelligent caching system isn't just about convenience—it directly improves nutrition tracking accuracy:

Caching → Better Physics Model

When receipt matching is 96% automatic, you're more likely to scan receipts consistently. More receipt data = better physics-based reconciliation = higher meal accuracy.

Users who scan receipts weekly reach 95+% meal accuracy by week 4. Users who skip receipt scanning plateau at 75-80% (still better than manual logging, but not research-grade).