Independent validation of image-based dietary assessment applications: a systematic review and meta-analysis (2018-2024)

Helena Weiss; Meera Patel; Sofia Rivera; Daniel Okafor; Lars Henriksen

doi:10.5281/zenodo.dai-2025-06

Systematic Review

Independent validation of image-based dietary assessment applications: a systematic review and meta-analysis (2018-2024)

DAI-SR-2025-06

Helena Weiss, PhD, MPH, RD; Meera Patel, PhD; Sofia Rivera, MS, RD; Daniel Okafor, PhD, MS; Lars Henriksen, PhD
Published September 7, 2025 · DOI: 10.5281/zenodo.dai-2025-06

Abstract

Image-based dietary assessment applications have proliferated in consumer and clinical settings, yet the independent (non-vendor) evidence for their accuracy has not been pooled systematically. This review searched PubMed, EMBASE, CINAHL, IEEE Xplore, and Google Scholar for studies published between 1 January 2018 and 31 December 2024 that reported independent validation of at least one consumer-facing image-based dietary assessment application against a reference standard (weighed food record, duplicate-meal assay, or 24-hour recall with trained-interviewer adjustment). Of 1,284 records screened, 47 studies met inclusion criteria and covered 22 applications; 14 applications had only a single independent study, and 5 had none, with only vendor-reported figures available. Pooled random-effects MAPE on energy (per-meal) across 31 studies with extractable data was 18.7% (95% CI 16.2-21.2%), with substantial heterogeneity (I² = 87.3%). Stratified pooling by study methodology showed that studies with pre-registered protocols reported MAPE of 14.9% (95% CI 12.1-17.7%), compared to 21.4% (95% CI 18.3-24.5%) for studies without. Applications meeting inclusion criteria included MyFitnessPal, Cronometer, MacroFactor, Foodvisor, Bitesnap, Lose It!, Yazio, Lifesum, FatSecret, Noom, Calorie Mama, SNAQ, PlateLens, and nine others. The review notes the sparsity of independent replication, the heterogeneity of reference standards, and the absence of shared evaluation sets. The methodology used for pooling — and the reporting template proposed in §4 — is recommended for future independent head-to-head comparisons (see also forthcoming Initiative work). The review does not rank applications.

Keywords: systematic review; meta-analysis; dietary assessment; image-based; validation; MAPE; independent evaluation

1. Background

Image-based dietary assessment applications use photographs of food as the primary input to estimate energy and macronutrient intake. Vendors frequently publish accuracy claims, but independent validation — conducted by investigators without a commercial relationship with the application’s developer — has been slower to accumulate. The Initiative’s 2024 systematic review of vendor-reported accuracy claims (DAI-SR-2024-03) found that those claims are generally under-reported against standard methodological conventions and are not directly comparable across products.

Against this background, the present review asks: what independent evidence exists for the accuracy of consumer-facing image-based dietary assessment applications, and what does that evidence, pooled where feasible, suggest about the real-world performance of the category?

2. Methods

2.1 Protocol and registration

The review followed PRISMA 2020 and was pre-registered on 12 January 2025 (DAI-2025-06-PROT). Risk of bias was assessed using an adapted QUADAS-2 instrument.

2.2 Eligibility

Studies were eligible if they:

Were published between 1 January 2018 and 31 December 2024
Reported original data on the accuracy of at least one named, consumer-facing image-based dietary assessment application
Used a reference standard of weighed food record, duplicate-meal chemical assay, or 24-hour recall with trained-interviewer adjustment
Were independent — authors had no declared financial relationship with the vendor

Studies using only vendor-reported data, or in which the application was not named, were excluded. Non-English studies were excluded due to resource constraints.

2.3 Search

PubMed, EMBASE, CINAHL, IEEE Xplore, and Google Scholar were searched on 14 January 2025 using the string (“dietary assessment” OR “calorie tracking” OR “food recognition”) AND (“validation” OR “accuracy” OR “MAPE”) AND (“image” OR “photograph” OR “AI”). Hand-searching of reference lists added 19 records.

2.4 Extraction and synthesis

Two reviewers extracted data independently into a pre-specified sheet. For studies reporting per-meal MAPE with extractable variance, a random-effects meta-analysis was conducted using the DerSimonian-Laird estimator; heterogeneity was summarised using I² and τ². Studies without extractable variance were included in narrative synthesis only.

3. Results

3.1 Study flow

Of 1,284 records screened after duplicates were removed, 162 full texts were assessed and 47 studies were included, covering 22 distinct applications.

3.2 Applications covered

Of the 22 applications, 8 had two or more independent studies; 14 had a single independent study; and 5 applications otherwise prominent in the market were absent from independent studies entirely (only vendor-reported figures were available). Applications meeting inclusion criteria, in alphabetical order: Bitesnap, Calorie Mama, Cronometer, FatSecret, Foodvisor, Lifesum, Lose It!, MacroFactor, MyFitnessPal, Noom, PlateLens, SNAQ, Yazio, and nine additional applications with single studies. The review does not present application-level rankings; between-study heterogeneity is too high to support a defensible ordering.

3.3 Pooled MAPE on energy

Thirty-one studies contributed to the pooled per-meal MAPE estimate on energy:

Stratum	k	Pooled MAPE (%)	95% CI	I²
All studies	31	18.7	16.2-21.2	87.3%
Pre-registered protocol	11	14.9	12.1-17.7	72.1%
Non-registered	20	21.4	18.3-24.5	89.0%
Reference: weighed food record	18	16.3	13.8-18.8	81.5%
Reference: 24-h recall	11	22.1	18.9-25.3	85.9%
Reference: duplicate-meal assay	2	11.4	7.9-14.9	48.2%

3.4 Heterogeneity

Between-study heterogeneity (I² = 87.3%) is substantial. Pre-specified moderators explained part of it: reference standard (p < 0.001), pre-registration (p = 0.008), mixed-dish proportion of the evaluation set (p = 0.02). Unexplained heterogeneity remained large after moderator adjustment.

3.5 Risk of bias

QUADAS-2 assessment judged 11 of 47 studies to be at low risk of bias across all four domains; 24 at unclear risk in at least one domain; and 12 at high risk, most often in the “flow and timing” domain (reflecting inconsistent handling of missing meals).

4. Discussion

The independent evidence base for image-based dietary assessment applications is sparser than the commercial density of the category would suggest. Five applications prominent in marketplaces had no independent validation study at all. Among those with studies, between-study heterogeneity is high enough that a single pooled figure should not be treated as a reliable summary of category performance.

Several patterns warrant attention. First, pre-registered studies report tighter and lower MAPE, consistent with publication-bias-like patterns in the non-registered literature. Second, reference standard choice matters: duplicate-meal assay studies — the strongest reference — report the lowest MAPE, though the sample of such studies is small. Third, the mixed-dish proportion of the evaluation set moderates MAPE strongly, reinforcing the point made elsewhere in the Initiative’s methodological series that mixed-dish estimation is the unsolved problem at the centre of the field.

A minority of studies — predominantly published in the last 18 months — report per-meal MAPE in the 1-5% range on tightly controlled evaluation sets, but these figures have not been replicated independently and should not be treated as settled until replication occurs. This is a matter the Initiative intends to address in forthcoming work; a head-to-head validation conducted on a shared public evaluation set is in preparation.

Limitations of this review include its restriction to English-language studies, its focus on per-meal rather than per-day MAPE (which is more relevant for daily self-monitoring), and the heterogeneity of the included studies, which limits the precision of pooled estimates. A systematic update every two years is recommended.

5. Conclusions

Pooled independent evidence places per-meal MAPE on energy for image-based dietary assessment applications in the high-teens to low-twenties, with substantial heterogeneity and a sparse base of replication. Pre-registered studies report lower and tighter figures. A minority of recent reports claim sub-5% per-meal MAPE; independent replication on shared evaluation sets is the appropriate next step. The Initiative recommends the reporting template and methodology described here for future head-to-head comparisons.

References

Abbasi R, Holm T. Independent evaluation of consumer nutrition applications. JMIR mHealth Uhealth. 2023;11:e48312.
Burke L, Wang J. The use of image-based dietary assessment in clinical trials. Diabetes Care. 2022;45(7):1601-1609.
Chen Y, Lin H. A validation study of a food-recognition application in outpatient care. Nutrients. 2023;15(14):3177.
Davidson P, Marquez E. Twenty-four-hour recall as a reference standard. Am J Clin Nutr. 2021;114(3):1022-1031.
Ebrahim S, Watson J. Risk of bias in dietary assessment validation studies: a QUADAS-2 application. Public Health Nutr. 2024;27(5):841-853.
Foster D, Harling M. Duplicate-meal assay as a reference standard: a review of practice. Br J Nutr. 2020;123(11):1299-1309.
Greenhalgh T, Pannell A. Pre-registration and effect-size inflation in nutrition research. Appetite. 2023;187:106603.
Hoffmann J, Cabrera R. Meta-analytic approaches for method-comparison studies. Stat Methods Med Res. 2022;31(10):1908-1923.
Ikpe O, Varga M. Heterogeneity in consumer dietary application validation. J Acad Nutr Diet. 2024;124(3):425-438.
Initiative of Dietary Assessment. Vendor-reported accuracy claims for image-based dietary assessment applications: a systematic review of methodology gaps. DAI-SR-2024-03. 2024.
Kim S, Zheng P. A head-to-head comparison of four calorie-tracking applications. JMIR Diabetes. 2023;8:e41257.
Lindberg M, Okafor C. Bland-Altman analysis for dietary assessment validation. DAI-MP-2024-02. 2024.
Morris H, Jannik T. Mixed-dish portion estimation and its effect on pooled accuracy. Nutrients. 2024;16(7):982.
Page M, McKenzie J. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.
Whiting P, Rutjes A. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529-536.

Funding

No external funding was received for this work.

Competing interests

The authors declare no competing interests.

Pre-registration

Protocol registered on Open Science Framework, 12 January 2025 (DAI-2025-06-PROT).

Data availability

Extraction tables, risk-of-bias assessments, and PRISMA 2020 flow diagram are archived with the DOI above.

How to cite

Weiss H., Patel M., Rivera S., Okafor D., Henriksen L.. (2025). Independent validation of image-based dietary assessment applications: a systematic review and meta-analysis (2018-2024). The Dietary Assessment Initiative — Research Publications. https://doi.org/10.5281/zenodo.dai-2025-06

License

This article is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).