Methodology Brief

Cuisine stratification in evaluation sets: definitions, allocations, and minimum N for inference

A methodology brief

Background

Image-based dietary assessment tools frequently exhibit non-uniform accuracy across cuisines: a system trained primarily on Western plated meals may perform noticeably less well on mixed rice dishes, communal stews, or composed street-food items. Evaluation sets that do not stratify by cuisine, or that stratify inconsistently, give misleading global accuracy estimates and obscure real gaps in tool readiness.

However, “cuisine” itself is a contested category. The Initiative’s position is that for evaluation purposes a pragmatic, structurally defined taxonomy is preferable to one based on geographic or cultural labels alone, and that stratum definitions should be stated operationally enough to permit replication.

The Method

Stratum definition. The Initiative uses a six-stratum operational taxonomy for evaluation sets, defined primarily by structural characteristics of the meal rather than national origin:

  1. Plated single-component - an identifiable protein plus sides on a single plate.
  2. Mixed rice or grain dishes - dishes in which a grain matrix incorporates multiple other components (for example, pilafs, biryanis, fried rice, paella).
  3. Composed bowls and stews - liquid- or semi-liquid-matrix dishes with multiple components partially submerged.
  4. Layered or stacked items - sandwiches, wraps, burgers, tacos, and analogous items where components are stacked and partly occluded.
  5. Beverages and soups - predominantly liquid items.
  6. Discrete-piece items - fruits, baked goods, confectionery served as identifiable whole units.

Each evaluation item is assigned to exactly one stratum. The taxonomy is intentionally structural because these categories correspond to visually distinct estimation problems (occlusion, ingredient overlap, portion cue availability), which is the relevant axis for image-based evaluation.

Allocation. For an evaluation set of total $N$, the Initiative’s default allocation is proportional to the deployment population’s consumption mix, with a floor of 15% per stratum that is scientifically relevant to the claim being made. Strata judged non-relevant may be excluded with justification; the exclusion is reported.

Minimum stratum size for inference. The Initiative requires $n_{\text{stratum}} \geq 30$ before stratum-level accuracy is reported quantitatively. Strata with $15 \leq n < 30$ are reported descriptively only. Strata with $n < 15$ are pooled into a “miscellaneous” category.

Worked example

A validation targeting a general-purpose tool for an urban US adult deployment might set $N = 180$ with the following allocation.

StratumPlanned sharePlanned nRationale
Plated single-component25%45Common in target population
Mixed rice/grain20%36Common, known difficult for image methods
Composed bowls/stews15%27Relevant, diverse
Layered/stacked20%36Very common (sandwiches, wraps)
Beverages/soups10%18Minimum-floor adjusted
Discrete-piece10%18Minimum-floor adjusted

In this plan, Beverages and Discrete-piece would be reported descriptively only, since their planned $n$ falls below the 30-item threshold for stratum-level quantitative inference. If stratum-level inference is required for these, the overall $N$ must be raised.

Common pitfalls

References

  1. Patel R. A structural taxonomy for dietary assessment evaluation sets. Public Health Nutr. 2024;27(5):1155-1164.
  2. Rivera M, Patel R. Cuisine-level accuracy variation in image-based dietary assessment: a re-analysis. Nutrients. 2023;15(11):2590.
  3. Caballero M, Yoshida H. Occlusion and ingredient overlap as determinants of estimation error in food images. JMIR mHealth Uhealth. 2022;10(12):e39811.
  4. Patel R, Okafor N. Minimum cell sizes for stratum-level inference in validation studies. Stat Med. 2024;43(7):1220-1233.
  5. Hernandez A, Linde J. The problem with national-cuisine labels in algorithm evaluation. J Nutr. 2022;152(9):2033-2039.
  6. Ahlgren P. Representativeness of evaluation sets in nutrition technology: a critical appraisal. Br J Nutr. 2021;126(12):1795-1806.
  7. Patel R. Pre-specifying stratum pooling rules in validation protocols. Am J Clin Nutr. 2025;121(3):622-628.

Keywords

cuisine; stratification; evaluation set; dietary assessment; sampling design; generalisability; image-based

License

This piece is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).