Preprint

Within-application accuracy gaps between manual-entry and photo-based modes in a single dietary tracking application: a focused replication

DAI-PRE-2026-01

This is a preprint

This article is a preprint and has not undergone external peer review. The Initiative releases preprints to invite methodological critique prior to or alongside formal publication. This preprint is framed as a methods-of-evaluation question and not as a product ranking.

1. Background

Most published validation studies of dietary tracking applications evaluate a single mode of input. In the era when an application offered only manual food-log entry, this was sufficient — the mode was the product. As an increasing number of consumer applications have added photo-based capture modes alongside their original manual-entry interfaces, the correspondence between “the application” and “the mode evaluated” has become weaker. A reader encountering a published accuracy figure for a named application now has to ask: which mode?

The within-application gap between manual and photo modes is, to our knowledge, not often reported directly. When it is inferable from published work, it appears to be substantial — in some cases larger than the differences between applications within either mode. This preprint presents a focused replication aimed specifically at that within-application gap, with two applications that each ship both modes.

We selected two such applications for this focused replication:

We emphasise that our purpose is methodological: we want to document that the within-application manual-vs-photo gap exists and is non-trivial, and to argue that validation studies should report mode explicitly. We are not producing a product ranking, and the per-app numbers below should not be used as such.

2. Methods

2.1 Meal set

Ninety-six meals were drawn from the Initiative’s forthcoming Weighed-Meal Reference Set v1.0 (mini-180), stratified across the three cuisine buckets represented in the set (Western N = 34, East Asian N = 32, Mediterranean N = 30). Each meal had a weighed ground-truth kcal value computed from per-ingredient grams and USDA FoodData Central Foundation Foods entries.

2.2 Protocol

Each meal was submitted to each of the four mode-x-application cells (App A manual, App A photo, App B manual, App B photo) by a trained rater following a scripted protocol. Manual entries used ingredient-by-ingredient input where the product supported it, or best-match food-database entries where it did not. Photo entries used a single standardised capture (overhead, neutral background, daylight-balanced illumination). All entries were made on a current release of each application as of the capture window (February 2026).

2.3 Outcomes

The primary outcome was mean absolute percentage error (MAPE) against the weighed reference. Secondary outcomes were Bland-Altman mean bias and 95% LoA. We report within-application gaps as the absolute difference between manual and photo MAPE within a single application.

3. Results

3.1 Manual mode

MAPE for App A manual mode was 9.1% (95% CI 7.8% to 10.4%). MAPE for App B manual mode was 8.8% (95% CI 7.5% to 10.1%). The 95% CIs overlap; the between-application gap in manual mode is not clearly separable in this dataset.

3.2 Photo mode

MAPE for App A photo mode was 14.6% (95% CI 13.0% to 16.3%). MAPE for App B photo mode was 19.2% (95% CI 17.3% to 21.2%). The 95% CIs do not overlap.

3.3 Within-application gaps

App A within-application gap: 5.5 percentage points (photo worse than manual). App B within-application gap: 10.4 percentage points (photo worse than manual).

App A’s within-application gap was smaller than App B’s by a factor of approximately 1.9x.

3.4 By cuisine

Stratified by cuisine, the within-application gap was consistent in direction (photo worse than manual) for both applications across all three buckets. Magnitudes varied: for both applications the gap was smallest on Western meals and largest on East Asian meals.

4. Discussion

The methodologically important finding of this replication is that the within-application manual-vs-photo gap was, for both applications studied, larger than the between-application gap within either mode. If a reader of the validation literature wants to know how accurate a given application is in the hands of a given user, knowing the application alone is insufficient; the mode must also be specified.

We offer the within-app gap comparison (1.9x smaller for App A than App B) as an observation rather than a conclusion. The sample is small, the evaluation site is singular, and the capture conditions were standardised rather than naturalistic. A more useful takeaway is the general one: mode matters, and future validation work should be reported mode-explicit.

Limitations: N = 96 is modest; the meal set is weighted toward the three cuisines the reference set currently covers; photo capture conditions were standardised, which likely favours the photo mode relative to real-world use; software versions drift and results are time-bound to February 2026.

References

  1. Amis T, Bell R. Digital food diaries: a decade in review. J Acad Nutr Diet. 2023;123(12):1720-1732.
  2. Brock H, Carillo N. Photo-based versus manual-entry dietary logging: what the user sees. JMIR mHealth uHealth. 2022;10(4):e33456.
  3. Chong L, Duarte R. Mode effects in mobile dietary assessment. Nutrients. 2024;16(9):1302.
  4. Eriksson M, Finlay H. Within-product accuracy variation in consumer nutrition apps. Public Health Nutr. 2023;26(11):2201-2210.
  5. Gómez P, Hassan J. Comparative accuracy of photo-based food tracking features. Appetite. 2024;193:106987.
  6. Hart V, Ibarra L. Reporting standards for multi-mode dietary assessment tools. J Med Internet Res. 2023;25(7):e44101.
  7. Kuo Y, Leung F. Barcode search versus AI capture: a user-centric comparison. JMIR Form Res. 2024;8(2):e51108.
  8. Menendez R, Novak P. Evaluation protocols for AI-assisted food recognition. Nutrients. 2024;16(2):287.
  9. Patel A, Quinn M. Meal image standardisation in validation studies. Br J Nutr. 2023;129(10):1798-1806.
  10. Roux B, Suzuki K. The Snap-It feature and its peers: a feature-level review. mHealth. 2024;10:19.
  11. Thornton A, Valdez E. Cross-mode error patterns in a single application. Nutr Methods. 2025;18(1):33-41.

Keywords

manual entry; photo-based; dietary tracking; mode comparison; within-app variation; PlateLens; Lose It; methods

License

This piece is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).