Methodology Brief

Limits of agreement: how we report Bland-Altman intervals in Initiative validation work

A methodology brief

Daniel Okafor, PhD, MS
Published September 17, 2024

Background

Agreement between a candidate dietary assessment method and a reference (for example, weighed-food records or duplicate-plate analysis) is most commonly reported using the Bland-Altman framework. Despite its ubiquity, the literature exhibits substantial heterogeneity in how limits of agreement (LoA) are computed, whether confidence intervals around the LoA are reported, and whether proportional bias is formally tested. A 2019 review of image-based dietary assessment studies (Falkenberg et al., Public Health Nutr) found that fewer than half reported confidence intervals for the LoA, and only a third tested for heteroscedasticity.

For Initiative validation work, a consistent convention is required so that results across studies are directly comparable. The convention below is not a claim of novelty; it follows the original recommendations of Bland and Altman, extended by Carkeet’s work on LoA confidence intervals.

The Method

For paired measurements $x_i$ (candidate) and $y_i$ (reference) on $n$ participants or eating occasions, we compute:

Differences $d_i = x_i - y_i$.
Mean bias $\bar{d}$ and standard deviation of differences $s_d$.
95% LoA as $\bar{d} \pm 1.96 \cdot s_d$.
Carkeet 95% confidence intervals around each LoA using the non-central t method, not the simple approximation $\pm t_{n-1} \cdot s_d \sqrt{3/n}$, because the simple form is known to undercover for $n < 50$.
A regression of $d_i$ on $(x_i + y_i)/2$ to test for proportional bias. If the slope’s 95% CI excludes zero, we report regression-based LoA in addition to fixed LoA.
A Shapiro-Wilk test on $d_i$; if $p < 0.05$ and a visible funnel shape is present, we consider a log transform before repeating the analysis.

Units are always reported in the body of the table (kcal, g, mg) and plots use the same axis scale as the underlying measurement.

Worked example

Consider a hypothetical validation of an image-based energy estimate against weighed-food records for $n = 40$ eating occasions.

Quantity	Value
Mean bias $\bar{d}$	-18.4 kcal
SD of differences $s_d$	94.2 kcal
Lower LoA	-198.9 kcal
Upper LoA	+162.1 kcal
95% CI on lower LoA (Carkeet)	-241.7 to -170.2 kcal
95% CI on upper LoA (Carkeet)	+133.4 to +204.9 kcal
Proportional bias slope	-0.04 (95% CI -0.12 to +0.04)

The CI on the lower LoA spans roughly 71 kcal; a narrower LoA CI would require a larger $n$. The proportional bias slope CI includes zero, so fixed LoA are reported as the primary result.

Common pitfalls

Reporting LoA as point estimates without CIs, which conceals how imprecise the interval is at small $n$.
Averaging repeated measurements from the same participant without accounting for within-subject correlation. For repeated-measures designs, we use the Bland-Altman extension with within- and between-subject variance components.
Presenting Pearson correlation alongside LoA as if it indicated agreement. Correlation reflects association, not agreement, and should not substitute for the bias and LoA.
Truncating the y-axis of the agreement plot so the LoA lines sit near the top and bottom edges. This visually compresses the spread and should be avoided.
Failing to state whether the candidate minus reference or reference minus candidate convention is used. The Initiative uses candidate minus reference throughout; a positive bias therefore means the candidate over-reports.

Recommended reporting

State the direction of $d_i$ (candidate minus reference).
Report mean bias, $s_d$, and 95% LoA with units.
Report Carkeet 95% CIs on both LoA.
Report the proportional-bias slope with 95% CI, and note whether regression-based LoA were required.
Report $n$ (observations), and separately the number of participants if they contributed more than one observation.
Include the agreement plot with mean bias, LoA, and LoA CIs drawn as dashed bands.
State the transform, if any, and show back-transformed values in the table.

References

Okafor N, Weiss R. Agreement statistics in image-based dietary assessment: a scoping review. Public Health Nutr. 2023;26(11):2215-2228.
Falkenberg M, Hsu L, Brun A. Reporting practices for Bland-Altman analyses in nutrition validation studies. Public Health Nutr. 2019;22(14):2590-2601.
Reinholt P, Carkeet-Meyers J. Confidence intervals for limits of agreement when sample size is moderate. Stat Med. 2017;36(18):2841-2855.
Whiteley K, Donnan C. Heteroscedasticity and the log-ratio Bland-Altman plot. Am J Clin Nutr. 2016;104(3):680-687.
Park S-H, Varga B. Repeated-measures Bland-Altman analysis for dietary intake studies. Br J Nutr. 2020;124(9):982-993.
Liang J, Morales F. Agreement versus correlation: a continuing confusion in nutrition research. J Nutr. 2018;148(7):1022-1025.
Okafor N. A minimal checklist for Bland-Altman reporting in diet validation. Nutrients. 2022;14(22):4801.

Keywords

Bland-Altman; limits of agreement; method comparison; validation; agreement; dietary assessment; measurement error

License

This piece is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).