Bland-Altman analysis for dietary assessment validation: conventions, common errors, and recommended reporting

Daniel Okafor; Lars Henriksen

doi:10.5281/zenodo.dai-2024-02

Methodology Paper

Bland-Altman analysis for dietary assessment validation: conventions, common errors, and recommended reporting

DAI-MP-2024-02

Daniel Okafor, PhD, MS; Lars Henriksen, PhD
Published June 11, 2024 · DOI: 10.5281/zenodo.dai-2024-02

Abstract

Bland-Altman analysis remains the most widely used graphical approach for comparing two methods of continuous measurement, yet its application to dietary assessment is frequently incomplete or methodologically unsound. This methodology paper reviews the conventions of Bland-Altman analysis in the specific context of dietary assessment validation, where one measurement (typically a photograph-based or self-report estimate) is compared with a reference such as a weighed food record or a laboratory-assayed duplicate meal. The paper describes the assumptions underlying 95% limits of agreement (LoA), the distinction between repeatability coefficient and LoA, the treatment of proportional bias, and the handling of skewed residuals common in energy-intake data. Five common errors are documented with worked examples: (i) reporting only the mean bias without LoA, (ii) confusing standard error of the mean difference with the LoA half-width, (iii) pooling across meals when individuals contribute multiple observations without accounting for clustering, (iv) failing to log-transform or otherwise address heteroscedasticity, and (v) reporting LoA without clinical interpretation. A reporting template is proposed that includes the mean difference with its 95% CI, the LoA with their 95% CIs (via the method of Carkeet), a regression check for proportional bias, a variance check for heteroscedasticity, a clustering-aware variance estimator where applicable, and a pre-specified clinically acceptable range against which the LoA are interpreted. The template is intended for use in both vendor-reported and independent validation studies of image-based and manual-entry dietary assessment tools. Adoption of a consistent reporting template would materially improve the interpretability of validation results across the field.

Keywords: Bland-Altman; limits of agreement; dietary assessment; validation; methodology; reporting standards; heteroscedasticity

1. Introduction

Method-comparison studies occupy an awkward position in nutritional measurement: a new method is almost never calibrated against a true gold standard — the true intake of a free-living human is unobservable in principle — and so the analysis must treat both measurements as imperfect and ask how they agree. Bland-Altman analysis, introduced in 1983 and refined repeatedly since, is the dominant graphical and numerical tool for that question. Its strengths are interpretability and minimal assumptions; its weakness, in applied practice, is that it is so easy to produce a plot that investigators often stop before they have computed the quantities that matter.

This paper is addressed to investigators conducting validation studies of dietary assessment tools, and to reviewers and readers of such studies. The objective is not to restate the original Bland-Altman framework — comprehensive references exist — but to draw together the conventions that are specific to dietary assessment, where the variable of interest is usually energy (kcal) or a macronutrient (g), where within-subject clustering is routine, and where heteroscedasticity is the rule rather than the exception.

2. The Method

2.1 Core quantities

For paired measurements (x_i, y_i) where y is the test method (for example, a photograph-based estimate) and x is the reference (for example, a weighed food record), the following quantities form the core of a Bland-Altman analysis:

Difference: d_i = y_i − x_i
Mean of the pair: m_i = (y_i + x_i) / 2
Mean bias: \bar{d}, with 95% CI \bar{d} ± t_{0.975,n-1} · s_d / √n
Standard deviation of differences: s_d
95% limits of agreement (LoA): \bar{d} ± 1.96 · s_d, with CIs for each LoA as given by Carkeet (1998)

The plot is then the scatter of d_i against m_i, with horizontal lines at \bar{d} and at the two LoA.

2.2 Repeatability coefficient versus limits of agreement

A common confusion is between the repeatability coefficient (relevant when two measurements come from the same method) and the LoA (relevant when two measurements come from different methods). In validation studies the LoA is almost always what is wanted.

2.3 Proportional bias

A regression of d on m with the slope tested against zero detects proportional bias — the tendency for the difference to grow (or shrink) with the magnitude of intake. In dietary assessment, proportional bias is common at the extremes of intake, where both over-reporting at low intakes and under-reporting at high intakes have been documented.

2.4 Heteroscedasticity

The variance of d often grows with m in nutritional data. Three approaches are available: log-transformation of both measurements before differencing, modelling the LoA as a function of m, or presenting LoA on the ratio rather than difference scale. Which to choose depends on the distributional shape; all are preferable to ignoring the issue.

2.5 Clustered data

When participants contribute multiple meals, observations within a participant are correlated. Naïve LoA computed on pooled data will be too narrow. A clustering-aware estimator — for example, the method of Zou (2013) — should be used.

3. Worked Example

Consider a study in which 40 participants each contribute 5 lunch meals, giving 200 paired observations. The photograph-based estimate is compared against a weighed food record with USDA FoodData Central lookup.

Naïve analysis (ignoring clustering):

Mean bias: +14 kcal (95% CI +6 to +22)
s_d: 58 kcal
95% LoA: −100 to +128 kcal

Clustering-aware analysis:

Mean bias: +14 kcal (95% CI +3 to +25)
s_d (between + within decomposition): between = 22 kcal, within = 54 kcal
95% LoA (wider): −112 to +140 kcal

Regression of d on m yields slope +0.03 (95% CI −0.02 to +0.08), suggesting no material proportional bias. Residuals show modest heteroscedasticity; log-transformed LoA narrow to the interval corresponding to 0.82× to 1.23× the reference on the ratio scale. The pre-specified clinically acceptable range for per-meal kcal differences was ±150 kcal; the LoA lie just inside this range, supporting equivalence for the population studied.

4. Common Errors

Error 1: Reporting only the mean bias. A mean bias of +10 kcal with LoA of −400 to +420 kcal describes a method that is unbiased on average but useless individually. Reporting bias alone is misleading.

Error 2: Confusing standard error with LoA half-width. The 95% CI of the mean bias is far narrower than the LoA. It is not the same quantity and should never be substituted.

Error 3: Ignoring clustering. When each participant contributes multiple observations, pooled LoA underestimate the spread that a new user will experience. Between- and within-subject variance should be reported separately.

Error 4: Ignoring heteroscedasticity. A fan-shaped plot indicates that the LoA are not constant across the range of intake. Log transformation or ratio-scale LoA address this.

Error 5: Reporting LoA without clinical interpretation. An LoA of ±150 kcal is tight at breakfast and wide at dinner. The clinically acceptable range should be pre-specified; LoA should be interpreted against it.

5. Recommended Reporting

A validation study should report at minimum:

Element	Required form
Mean bias	Value, 95% CI, units
LoA	Two values, each with 95% CI (Carkeet)
Proportional bias test	Regression slope, 95% CI, p-value
Heteroscedasticity check	Variance-on-mean test or visual description
Clustering treatment	Method (e.g., Zou 2013), between/within SD
Clinical acceptability range	Pre-specified value with rationale
Interpretation	Relation of LoA to clinical range

Where possible, the plot should display both the raw data and the regression line, the LoA as bands (dashed) with their CIs (shaded), and an indication of the clinical range.

References

Altman D, Bland J. Measurement in medicine: the analysis of method comparison studies. Statistician. 1983;32:307-317.
Bland J, Altman D. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135-160.
Carkeet A. Exact parametric confidence intervals for Bland-Altman limits of agreement. Optom Vis Sci. 1998;75(12):890-897.
Dewitte K, Fierens C. Application of the Bland-Altman plot for interpretation of method-comparison studies. Clin Chem. 2002;48(5):799-801.
Euser A, Dekker F. A practical approach to Bland-Altman plots with multiple observations per subject. J Clin Epidemiol. 2008;61(10):978-982.
Gerke O. Reporting standards for a Bland-Altman agreement analysis. Diagnostics. 2020;10(5):334.
Hanneman S. Design, analysis, and interpretation of method-comparison studies. AACN Adv Crit Care. 2008;19(2):223-234.
Ludbrook J. Confidence in Altman-Bland plots: a critical review of the method of differences. Clin Exp Pharmacol Physiol. 2010;37(2):143-149.
Myles P, Cui J. I. Using the Bland-Altman method to measure agreement with repeated measures. Br J Anaesth. 2007;99(3):309-311.
Zou G. Confidence interval estimation for the Bland-Altman limits of agreement with multiple observations per individual. Stat Methods Med Res. 2013;22(6):630-642.

Funding

No external funding was received for this work.

Competing interests

The authors declare no competing interests.

How to cite

Okafor D., Henriksen L.. (2024). Bland-Altman analysis for dietary assessment validation: conventions, common errors, and recommended reporting. The Dietary Assessment Initiative — Research Publications. https://doi.org/10.5281/zenodo.dai-2024-02

License

This article is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).