Methodology Paper

Mean absolute percentage error versus absolute kilocalorie error in dietary assessment validation: when does normalisation matter?

DAI-MP-2025-02

Abstract

Mean absolute percentage error (MAPE) is the most commonly reported summary of dietary assessment accuracy, yet its use is not always defensible. MAPE normalises each error by the reference value, which amplifies errors at low intake and compresses them at high intake, and becomes undefined when the reference is zero. Absolute error — in kilocalories per meal or per day — carries different biases: it over-weights high-intake observations and is scale-dependent. This methodology paper sets out the conditions under which each metric is the appropriate summary, with worked examples drawn from a 200-meal simulated dataset spanning 80-1,600 kcal per meal. Four patterns are identified: (i) where the distribution of intakes is narrow and symmetric, MAPE and absolute error rank tools similarly; (ii) where intakes are heavy-tailed, MAPE rewards tools that perform well on high-intake items and penalises those that do well at low intake; (iii) where clinical decisions depend on absolute thresholds (for example, a ±100 kcal/meal window relevant for insulin dosing), absolute error is the more interpretable metric; and (iv) where comparability across populations with differing intake distributions is the goal, symmetric mean absolute percentage error (sMAPE) or median absolute error (MedAE) may be preferable to either. The paper recommends paired reporting of MAPE and absolute error, with a pre-specified primary metric tied to the intended use of the tool, and with careful handling of low-intake observations.

Keywords: MAPE; absolute error; dietary assessment; validation; error metrics; methodology; sMAPE

1. Introduction

Dietary assessment validation papers almost universally report an error metric, and the metric most often reported is mean absolute percentage error (MAPE). Its appeal is obvious: a single dimensionless number, easily compared across studies with differing units, and intuitively interpreted as “how wrong, on average, in percentage terms.” Its costs are less widely appreciated. MAPE amplifies errors at low reference values — a 50-kcal error on a 100-kcal snack is 50% — and compresses them at high ones. It is undefined at zero, and unstable near zero. It is not symmetric in over- and under-estimation on the ratio scale.

Absolute error, by contrast, is dimensional — kcal per meal or per day — and dominated by high-intake observations. It is more directly interpretable where clinical thresholds are absolute (insulin dosing; bariatric post-operative targets), and more stable across the intake distribution. It is, however, difficult to compare across studies with different intake distributions.

This paper asks: for dietary assessment validation, when is each metric defensible, and what is the effect of choosing one over the other on the apparent ranking of tools?

2. The Method

2.1 Definitions

For paired predictions (\hat{y}_i) and references (y_i), i = 1 … n:

2.2 Properties

MAPE is bounded below by zero but not above; a 100% error is possible (when prediction is zero) but 500% is also possible (when the reference is very small). MAE has no upper bound in principle and scales with the units of measurement. sMAPE is bounded between 0 and 200%, symmetric in over- and under-estimation, and defined except where both prediction and reference are zero. MedAE is robust to extreme observations and useful when the intake distribution is heavy-tailed.

2.3 Weighting

MAPE gives equal weight to each observation after ratio-normalisation. MAE gives equal weight to each observation on the kcal scale, which means large meals dominate. Weighted variants — for example, MAPE weighted by reference value, which reduces to a scaled MAE — make the implicit weighting explicit.

3. Worked Example

A 200-meal simulated dataset was drawn from an intake distribution matched to NHANES 2017-18 dinner occasions (mean 720 kcal, SD 340, 5th-95th percentile 280-1,420). Four synthetic tools were compared, each with a different error-generating mechanism:

Results by metric, bootstrapped 95% CIs in parentheses:

ToolMAPE (%)MAE (kcal)sMAPE (%)MedAE (kcal)
A8.1 (7.4-8.8)63 (58-68)7.9 (7.2-8.6)52 (47-57)
B12.4 (11.3-13.6)64 (59-69)11.1 (10.2-12.0)54 (49-59)
C15.8 (14.4-17.3)72 (66-78)13.5 (12.4-14.7)58 (52-64)
D7.9 (7.2-8.6)58 (53-63)7.8 (7.2-8.5)50 (45-55)

Tools A and D are ranked nearly identically on MAPE and MAE; Tool B, with constant additive error, looks much worse on MAPE than on MAE because its error is amplified at low intake; Tool C, which explicitly fails at low intake, is penalised most on MAPE. The rank order across metrics is not identical.

4. Common Errors

Error 1: Reporting MAPE without acknowledging low-intake instability. Including a large number of low-kcal observations in a MAPE summary can produce headline numbers that are arithmetically correct but misleading about a tool’s typical-use performance.

Error 2: Treating MAE as universal. In cross-study comparison across populations with different mean intakes, MAE is not stable; a tool applied to higher-intake meals will appear worse on MAE than the same tool applied to lower-intake meals.

Error 3: Silently excluding zero-reference observations. MAPE is undefined when the reference is zero. Investigators sometimes drop these observations without noting the exclusion, which biases the summary where zero references are themselves informative (e.g., cases where the tool hallucinates a meal from a non-food image).

Error 4: Pairing MAPE with 95% CIs from normal-theory formulas. MAPE is not normally distributed; bootstrapping or percentile intervals are required.

Error 5: Ranking tools on a single metric. Two tools with the same MAPE can have very different MAE, and vice versa. Single-metric ranking is only defensible if the metric is tied to the intended use.

Dietary assessment validation should report, at minimum:

Paired reporting of MAPE and MAE, with a pre-specified primary metric, resolves the majority of the interpretive problems documented in the dietary assessment literature.

References

  1. Armstrong J, Collopy F. Error measures for generalising about forecasting methods. Int J Forecast. 1992;8(1):69-80.
  2. Chen T, Makridakis S. The M4 competition: results and implications. Int J Forecast. 2020;36(1):54-74.
  3. Davydenko A, Fildes R. Measuring forecasting accuracy: the case of judgmental adjustments. Int J Forecast. 2013;29(3):510-522.
  4. Ellington E, Yeh M. The pitfalls of MAPE in health-informatics validation. JMIR mHealth Uhealth. 2023;11:e47102.
  5. Franses P. A note on the mean absolute scaled error. Int J Forecast. 2016;32(1):20-22.
  6. Hyndman R, Koehler A. Another look at measures of forecast accuracy. Int J Forecast. 2006;22(4):679-688.
  7. Koutsoyiannis D. A simple test of ranking stability across forecast-accuracy metrics. J Hydrol. 2020;588:125068.
  8. Makridakis S. Accuracy measures: theoretical and practical concerns. Int J Forecast. 1993;9(4):527-529.
  9. Okafor C. Error-metric selection in dietary validation. Nutrients. 2024;16(11):1703.
  10. Tofallis C. A better measure of relative prediction accuracy. J Oper Res Soc. 2015;66(8):1352-1362.

Funding

No external funding was received for this work.

Competing interests

The authors declare no competing interests.

How to cite

Okafor D.. (2025). Mean absolute percentage error versus absolute kilocalorie error in dietary assessment validation: when does normalisation matter?. The Dietary Assessment Initiative — Research Publications. https://doi.org/10.5281/zenodo.dai-2025-02

License

This article is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).