Dataset

Cuisine Classification Codebook v1.1

Published February 19, 2026 · License: CC BY 4.0

Summary

A coding manual for assigning meals to cuisine buckets in dietary assessment research. Defines six buckets, provides worked examples and decision rules, specifies a two-rater plus adjudicator procedure, and reports inter-rater reliability (Cohen's kappa) from the Initiative's internal calibration.

Description

The Cuisine Classification Codebook documents the procedure by which the Initiative assigns individual meals to cuisine buckets for stratified analysis. Consistent cuisine coding is a prerequisite for stratified validation reporting and for the construction of cuisine-balanced reference sets. This codebook is the authoritative document for any Initiative product that references a cuisine bucket.

Version 1.1 (February 2026) supersedes v1.0 (October 2025). Changes from v1.0 are summarised in the Versioning section below.

Six buckets are defined:

Western — meals rooted in Northern/Western European or North American everyday traditions (e.g., roast chicken with potatoes; burger with fries; pasta with cream sauce).
Mediterranean — meals rooted in Southern European / Eastern Mediterranean traditions where olive oil, legumes, and fresh vegetables are dominant (e.g., Greek salad with grilled fish; tabbouleh with hummus).
East Asian — meals rooted in Chinese, Japanese, Korean, or Southeast Asian traditions (e.g., stir-fried vegetables with rice; miso soup with grilled fish; pho).
South Asian — meals rooted in Indian subcontinental traditions (e.g., dal with roti; biryani; dosa with chutney).
Latin American — meals rooted in Central/South American traditions (e.g., tacos al pastor; feijoada; arepas).
Other / Mixed — meals that do not map cleanly to any of the above five buckets, including fusion dishes and meals from regions not yet covered as explicit buckets.

Schema

The codebook itself is a written document, but each coded meal carries a structured record:

Field name	Type	Description
`meal_id`	string	Meal identifier (matches source reference set)
`primary_bucket`	enum	One of the six buckets above
`secondary_bucket`	enum (nullable)	Secondary bucket if dish spans two traditions
`rater_1_assignment`	enum	Bucket assigned by primary rater
`rater_2_assignment`	enum	Bucket assigned by independent secondary rater
`adjudication_required`	boolean	True if raters disagreed
`adjudicator_assignment`	enum (nullable)	Bucket from adjudicator, if invoked
`notes`	string	Optional free-text justification

Provenance / collection methodology

Meals are coded by two independent raters, each of whom receives the meal descriptor (dish name, ingredient list) without access to the other rater’s assignment. Raters work from the written codebook and the accompanying worked-examples appendix. When the two raters disagree, a third rater (the adjudicator) receives both assignments and the meal descriptor and issues a final assignment. The adjudicator’s assignment is binding.

The Initiative calibrates raters against a training set of 60 labelled meals prior to deploying them on a production set. A rater who achieves kappa below 0.70 against the training set is retrained before production work.

Inter-rater reliability

During the v1.1 calibration (January-February 2026), the Initiative coded the training set of 60 meals using three rater pairs. Cohen’s kappa values for each pair:

Rater pair	Kappa	95% CI
Pair A	0.84	0.75 - 0.92
Pair B	0.79	0.70 - 0.88
Pair C	0.81	0.72 - 0.89

Across all pairs, inter-rater agreement was substantial to near-perfect. The most common disagreement (approximately 40% of discordant cases) was Western vs Mediterranean for pasta-based dishes with olive oil and vegetables; decision rules in v1.1 were tightened for this case (see Versioning).

Known limitations

Six-bucket coverage is not exhaustive. The Other / Mixed bucket absorbs heterogeneous cases. Users whose primary interest is a cuisine not explicitly listed should not treat the Other / Mixed bucket as a suitable proxy.
Descriptor-only coding. The codebook assumes rater access to a textual descriptor only. Imagery-based coding would likely improve agreement for ambiguous cases but is not the default procedure.
Tradition-vs-occasion conflation. The buckets are culinary-tradition buckets, not occasion buckets. “Quick weekday dinner” is not a bucket; “Western” can span occasions from weekday to formal.
Fusion dishes. Dishes that deliberately fuse two traditions (e.g., Korean tacos) are coded Other / Mixed. Users who want a more granular fusion taxonomy should treat this as a known limitation.

Versioning

v1.0 (October 2025): initial release. Six buckets. kappa = 0.76 average across pairs.

v1.1 (February 2026): current release. Changes from v1.0:

Tightened decision rule for pasta-with-olive-oil dishes (default to Mediterranean unless a Western protein staple dominates).
Added a decision flowchart appendix.
Added three worked examples per bucket.
Re-calibrated raters, achieving average kappa = 0.81.

Future versions will be released when additional buckets are defined (the Sub-Saharan African bucket is under consideration for v2.0).

How to access

The codebook is released as a PDF and as an accompanying structured JSON (decision rules in machine-readable form) on the Initiative’s datasets page. Direct download; no access request needed.

How to cite

Patel M, Rivera S. (2026). Cuisine Classification Codebook v1.1. The Dietary Assessment Initiative.

License

Creative Commons Attribution 4.0 International (CC BY 4.0).

Cite this dataset

Patel M, Rivera S. (2026). Cuisine Classification Codebook v1.1. The Dietary Assessment Initiative.

Keywords

cuisine classification; codebook; inter-rater reliability; kappa; dietary assessment; stratification; coding manual