Methodology Brief

Re-rating concordance protocol: blinded sub-sample re-evaluation for single-rater validation studies

A methodology brief

Background

Dietary assessment validation frequently depends on a single trained rater who codes reference data (for example, matching photographed foods to database entries, assigning portion sizes, or classifying stratum membership). While a single rater is often the only practical choice, it leaves the measurement reliability of the reference undocumented, and offers no way to detect rater drift over the course of data collection.

A pragmatic alternative to full double-coding is a blinded re-rating protocol: the original rater (or, better, a second trained rater) re-codes a random sub-sample of items under conditions that prevent recall of the original coding, and agreement metrics are computed on the sub-sample. The Initiative’s convention is that single-rater validation studies must include such a re-rating sub-sample and must report the concordance results.

The Method

Sub-sample size. A minimum of 10% of items or 30 items, whichever is greater, is drawn at random from the full evaluation set. The random draw is performed after initial coding is complete, with a documented random seed.

Blinding. Re-rating is performed at least 30 days after initial coding for the same rater, or by a different trained rater. The re-rater works from the same source material (photograph, diary entry) with all prior coding stripped from their view. The re-rater is not told which items are from the re-rating sub-sample.

Concordance metrics. For continuous outcomes (energy estimate, portion weight), the two-way random-effects, absolute-agreement intraclass correlation coefficient ICC(2,1) with 95% CI is computed. For categorical outcomes (stratum assignment, food identity), Cohen’s kappa with 95% CI is computed; for ordered categorical outcomes, weighted kappa with linear or quadratic weights as pre-specified.

Acceptance thresholds. Initiative default thresholds are ICC $\geq$ 0.80 for continuous outcomes and kappa $\geq$ 0.70 for categorical outcomes. Results below threshold trigger a full double-coding of the dataset before the primary analysis proceeds.

Drift check. The re-rating sub-sample is stratified in time across the data-collection window so that within-study drift can be detected. A formal test for drift compares the first third of re-rated items against the last third using Fisher’s exact test or a suitable continuous analogue.

Worked example

Consider a study in which a single rater coded 400 eating-occasion photographs for (a) estimated energy and (b) stratum assignment. A 10% re-rating sub-sample ($n = 40$) was drawn and re-coded by a second rater after an appropriate delay.

OutcomeMetricValue95% CIThresholdPass?
Energy estimate (kcal)ICC(2,1)0.890.81 to 0.94$\geq$ 0.80yes
Stratum assignment (6-level)Cohen’s kappa0.820.68 to 0.93$\geq$ 0.70yes
Food identity (top-1 match)Cohen’s kappa0.740.60 to 0.86$\geq$ 0.70marginal
Drift (first vs. last third, energy)-ns--no drift

Food-identity concordance is close to threshold; the Initiative convention is to proceed but to flag this in the results and discussion, and to consider a supplementary analysis restricted to items where both raters agreed.

Common pitfalls

References

  1. Okafor N, Rivera M. Single-rater validation studies: reliability reporting practices and gaps. Public Health Nutr. 2024;27(9):1780-1790.
  2. Reinholt P. Intraclass correlation coefficients in dietary validation: which variant to report. Stat Med. 2021;40(20):4532-4544.
  3. Hernandez A, Kessler F. Kappa and weighted kappa for dietary coding: a practical guide. Nutrients. 2022;14(23):5001.
  4. Rivera M, Okafor N. Drift detection in long validation studies: a simple protocol. Am J Clin Nutr. 2023;117(6):1244-1250.
  5. Caballero M. Blinding in re-rating studies: design and pitfalls. Br J Nutr. 2022;128(3):390-398.
  6. Park S-H. Agreement thresholds in nutrition coding: an empirical benchmark. Public Health Nutr. 2023;26(4):860-869.
  7. Okafor N. When re-rating fails: actions and analytic consequences. J Nutr. 2025;155(1):122-127.

Keywords

re-rating; inter-rater reliability; concordance; ICC; blinding; validation; reference coding; quality control

License

This piece is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).