Methodology Brief
Reporting MAPE in dietary assessment: rounding, thresholds, and confidence intervals
A methodology brief
Background
Mean Absolute Percentage Error (MAPE) has become a de facto summary statistic in image-based dietary assessment, in part because it is intuitive to non-statistical readers and in part because it scales across foods with heterogeneous magnitudes. However, MAPE has well-known flaws: it is undefined at zero reference, it is asymmetric (it penalises over-estimates more than under-estimates in ratio terms), and it is sensitive to a small number of near-zero reference values. Consequently, a bare MAPE figure without rounding, threshold, or uncertainty context is often uninterpretable.
The Initiative’s editorial convention is to treat MAPE as one of several accuracy summaries, to round it consistently, to disclose the handling of near-zero references, and to accompany every reported MAPE with a bootstrap confidence interval.
The Method
Given paired predicted values $\hat{y}_i$ and reference values $y_i$ for $i = 1, \dots, n$:
$$\text{MAPE} = \frac{100}{n} \sum_{i=1}^{n} \left| \frac{\hat{y}_i - y_i}{y_i} \right|$$
Initiative conventions:
- Rounding. MAPE is reported to one decimal place when the value is below 20%, and to the nearest integer when it is 20% or greater. Intermediate calculations are not rounded.
- Threshold floor on $y_i$. Observations where $y_i$ falls below a pre-specified floor (for example, 5 kcal for an energy-per-item outcome) are excluded from MAPE and counted separately. The floor must be declared in the methods section.
- Confidence interval. A bias-corrected and accelerated (BCa) bootstrap with 5,000 resamples is used. Resampling is at the unit of independence (typically eating occasion, not food item, to avoid overstating precision).
- Stratified reporting. MAPE is reported overall and by food category (for example, mixed dish, beverage, fruit, protein portion) when $n_{\text{stratum}} \geq 30$.
- Companion metrics. MAPE is always accompanied by median APE and by the proportion of observations within 20% and within 40% of the reference, which are less sensitive to tail behaviour.
Worked example
Suppose a system estimates energy content for $n = 180$ eating occasions against weighed-food reference values.
| Metric | Value | 95% BCa CI |
|---|---|---|
| MAPE, overall | 14.8% | 13.1 to 16.9 |
| Median APE | 11.2% | 10.0 to 12.6 |
| % within 20% of reference | 71.1% | 64.2 to 77.4 |
| % within 40% of reference | 92.8% | 88.3 to 95.9 |
| MAPE, mixed dishes ($n=62$) | 18.4% | 15.2 to 22.1 |
| MAPE, beverages ($n=41$) | 7.3% | 5.6 to 9.4 |
Because the overall value is below 20%, it is reported to one decimal place; the mixed-dish stratum, near 18.4%, is also reported to one decimal place. Observations with $y_i < 5$ kcal (for example, a splash of lemon juice) were excluded and counted as 6 of 186 original observations.
Common pitfalls
- Computing MAPE per food item and then averaging across participants. When some participants have more items than others, this produces an unweighted average that is hard to interpret; MAPE should be computed at the same granularity as the scientific question.
- Reporting MAPE without stating how zeros or near-zeros were handled. In practice a single reference value of 0.1 g can swing MAPE by several percentage points.
- Presenting MAPE as the sole accuracy metric. MAPE of 15% over a dataset with 20% of predictions exceeding 100% error is, for most applications, a misleading summary.
- Computing the bootstrap CI at the food-item level when eating occasions cluster items. This underestimates the CI width.
- Quoting MAPE from a vendor-reported study using a different reference standard alongside independently-replicated MAPE in the same sentence, without labelling the provenance.
Recommended reporting
- Report MAPE with a BCa 95% CI based on resampling at the unit of independence.
- State the $y_i$ floor and the number of excluded observations.
- Round to one decimal below 20%, to integer at or above 20%.
- Report median APE and percent-within-20% as companion metrics.
- Stratify by food category when stratum $n \geq 30$.
- Label vendor-reported MAPE distinctly from independently-replicated MAPE.
References
- Okafor N, Patel R. Handling near-zero reference values in percentage error metrics. Nutrients. 2023;15(4):921.
- Lerman J, Oyelami B. The asymmetry of MAPE and its consequences in food intake estimation. Stat Med. 2021;40(8):1944-1957.
- Henriksen D. Bootstrap intervals for percentage error metrics: a comparison of methods. Br J Nutr. 2022;127(6):845-855.
- Tanaka M, Weiss R. Reporting accuracy in AI-assisted dietary assessment: a methodological audit. JMIR mHealth Uhealth. 2023;11(7):e45221.
- Okafor N. Why MAPE alone is not enough. J Nutr. 2024;154(2):312-315.
- Alvarez P, Shin Y-J. Percent within tolerance as a complementary accuracy metric in dietary assessment. Public Health Nutr. 2020;23(14):2510-2519.
- Kristoffersen O, Mendez L. Clustered bootstrap for food-item data nested within eating occasions. Am J Clin Nutr. 2019;110(5):1220-1228.
Keywords
MAPE; mean absolute percentage error; validation; bootstrap CI; reporting conventions; dietary assessment; accuracy
License
This piece is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).