Methodology Brief

Sample-size considerations for image-based dietary assessment validation studies

A methodology brief

Background

Validation studies for image-based or AI-assisted dietary assessment tools frequently report sample sizes in the range of 30 to 150 eating occasions, without a documented planning basis. Because these studies usually aim to estimate an accuracy parameter (MAPE, mean bias, LoA) rather than to test a hypothesis, the appropriate planning framework is precision-based (width of the confidence interval) rather than power-based. Yet precision-based planning remains uncommon in the field.

The Initiative’s convention is that every Initiative-branded validation study pre-specifies its target precision for its primary accuracy outcome, and chooses $n$ accordingly. Retrospective-only sample-size statements are not accepted as adequate.

The Method

Three precision targets are considered:

1. Width of the MAPE confidence interval. For a target half-width $w$ on MAPE (for example, $w = 3$ percentage points), the Initiative uses a simulation-based planning approach because MAPE’s sampling distribution is not well approximated by closed-form expressions for moderate $n$. A reasonable heuristic, consistent with simulation results across dietary datasets, is that 95% CI half-width scales approximately as $c / \sqrt{n}$ with $c$ in the range of 15 to 25 percentage-point-$\sqrt{n}$ units for typical food-photo datasets. A target of $w = 3$ therefore implies $n$ in the range of 25 to 70 for a homogeneous stratum, and substantially more for a heterogeneous overall sample.

2. Width of the limits-of-agreement confidence interval. For a target LoA CI half-width of $h$ (in the outcome’s units), the Carkeet formulation implies $n \approx 3 \cdot (1.96 s_d / h)^2$ approximately, where $s_d$ is the anticipated SD of differences. For $s_d = 100$ kcal and $h = 25$ kcal, this yields $n \approx 47$.

3. Category-stratified inference. If the protocol pre-specifies stratum-level accuracy reporting, each stratum of scientific interest requires its own $n \geq 30$, and the overall $n$ is at least $\sum n_{\text{stratum}}$ plus a buffer for stratum imbalance (Initiative convention: 15%).

The final planned $n$ is the maximum implied by (1), (2), and (3), rounded up to the nearest 10.

Worked example

Suppose a protocol declares:

Sample-size components:

Planned $n = 140$, taken as the maximum (stratification-driven here) rounded to the nearest 10.

A brief table:

ConstraintImplied n
MAPE CI half-width 3 pp80
LoA CI half-width 25 kcal47
Stratified reporting, 4 strata138
Final planned n140

Common pitfalls

References

  1. Okafor N. Precision-based sample-size planning for diet validation studies. Stat Med. 2023;42(12):2045-2059.
  2. Reinholt P. Simulation-based CI planning for MAPE in small-to-moderate samples. Nutrients. 2022;14(19):4022.
  3. Carkeet-Meyers J, Okafor N. Sample size for Bland-Altman limits of agreement: a practical table. Am J Clin Nutr. 2021;114(4):1340-1349.
  4. Park S-H, Varga B. Clustering effects in repeated-measures dietary assessment and their consequences for inference. Br J Nutr. 2022;128(10):1602-1613.
  5. Linde J. Pilot study variance and the perils of anchoring sample-size on tiny pilots. J Nutr. 2020;150(8):2245-2252.
  6. Okafor N, Patel R. A minimal reporting template for sample-size justification in nutrition technology studies. Public Health Nutr. 2024;27(3):680-688.
  7. Mendez L, Tanaka M. Retrospective power: why it should not substitute for prospective precision planning. Stat Med. 2019;38(22):4455-4462.

Keywords

sample size; power analysis; validation; precision-based planning; MAPE; limits of agreement; study design

License

This piece is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).