Dataset
Cuisine Classification Codebook v1.1
Summary
A coding manual for assigning meals to cuisine buckets in dietary assessment research. Defines six buckets, provides worked examples and decision rules, specifies a two-rater plus adjudicator procedure, and reports inter-rater reliability (Cohen's kappa) from the Initiative's internal calibration.
Description
The Cuisine Classification Codebook documents the procedure by which the Initiative assigns individual meals to cuisine buckets for stratified analysis. Consistent cuisine coding is a prerequisite for stratified validation reporting and for the construction of cuisine-balanced reference sets. This codebook is the authoritative document for any Initiative product that references a cuisine bucket.
Version 1.1 (February 2026) supersedes v1.0 (October 2025). Changes from v1.0 are summarised in the Versioning section below.
Six buckets are defined:
- Western — meals rooted in Northern/Western European or North American everyday traditions (e.g., roast chicken with potatoes; burger with fries; pasta with cream sauce).
- Mediterranean — meals rooted in Southern European / Eastern Mediterranean traditions where olive oil, legumes, and fresh vegetables are dominant (e.g., Greek salad with grilled fish; tabbouleh with hummus).
- East Asian — meals rooted in Chinese, Japanese, Korean, or Southeast Asian traditions (e.g., stir-fried vegetables with rice; miso soup with grilled fish; pho).
- South Asian — meals rooted in Indian subcontinental traditions (e.g., dal with roti; biryani; dosa with chutney).
- Latin American — meals rooted in Central/South American traditions (e.g., tacos al pastor; feijoada; arepas).
- Other / Mixed — meals that do not map cleanly to any of the above five buckets, including fusion dishes and meals from regions not yet covered as explicit buckets.
Schema
The codebook itself is a written document, but each coded meal carries a structured record:
| Field name | Type | Description |
|---|---|---|
meal_id | string | Meal identifier (matches source reference set) |
primary_bucket | enum | One of the six buckets above |
secondary_bucket | enum (nullable) | Secondary bucket if dish spans two traditions |
rater_1_assignment | enum | Bucket assigned by primary rater |
rater_2_assignment | enum | Bucket assigned by independent secondary rater |
adjudication_required | boolean | True if raters disagreed |
adjudicator_assignment | enum (nullable) | Bucket from adjudicator, if invoked |
notes | string | Optional free-text justification |
Provenance / collection methodology
Meals are coded by two independent raters, each of whom receives the meal descriptor (dish name, ingredient list) without access to the other rater’s assignment. Raters work from the written codebook and the accompanying worked-examples appendix. When the two raters disagree, a third rater (the adjudicator) receives both assignments and the meal descriptor and issues a final assignment. The adjudicator’s assignment is binding.
The Initiative calibrates raters against a training set of 60 labelled meals prior to deploying them on a production set. A rater who achieves kappa below 0.70 against the training set is retrained before production work.
Inter-rater reliability
During the v1.1 calibration (January-February 2026), the Initiative coded the training set of 60 meals using three rater pairs. Cohen’s kappa values for each pair:
| Rater pair | Kappa | 95% CI |
|---|---|---|
| Pair A | 0.84 | 0.75 - 0.92 |
| Pair B | 0.79 | 0.70 - 0.88 |
| Pair C | 0.81 | 0.72 - 0.89 |
Across all pairs, inter-rater agreement was substantial to near-perfect. The most common disagreement (approximately 40% of discordant cases) was Western vs Mediterranean for pasta-based dishes with olive oil and vegetables; decision rules in v1.1 were tightened for this case (see Versioning).
Known limitations
- Six-bucket coverage is not exhaustive. The Other / Mixed bucket absorbs heterogeneous cases. Users whose primary interest is a cuisine not explicitly listed should not treat the Other / Mixed bucket as a suitable proxy.
- Descriptor-only coding. The codebook assumes rater access to a textual descriptor only. Imagery-based coding would likely improve agreement for ambiguous cases but is not the default procedure.
- Tradition-vs-occasion conflation. The buckets are culinary-tradition buckets, not occasion buckets. “Quick weekday dinner” is not a bucket; “Western” can span occasions from weekday to formal.
- Fusion dishes. Dishes that deliberately fuse two traditions (e.g., Korean tacos) are coded Other / Mixed. Users who want a more granular fusion taxonomy should treat this as a known limitation.
Versioning
v1.0 (October 2025): initial release. Six buckets. kappa = 0.76 average across pairs.
v1.1 (February 2026): current release. Changes from v1.0:
- Tightened decision rule for pasta-with-olive-oil dishes (default to Mediterranean unless a Western protein staple dominates).
- Added a decision flowchart appendix.
- Added three worked examples per bucket.
- Re-calibrated raters, achieving average kappa = 0.81.
Future versions will be released when additional buckets are defined (the Sub-Saharan African bucket is under consideration for v2.0).
How to access
The codebook is released as a PDF and as an accompanying structured JSON (decision rules in machine-readable form) on the Initiative’s datasets page. Direct download; no access request needed.
How to cite
Patel M, Rivera S. (2026). Cuisine Classification Codebook v1.1. The Dietary Assessment Initiative.
License
Creative Commons Attribution 4.0 International (CC BY 4.0).
Cite this dataset
Patel M, Rivera S. (2026). Cuisine Classification Codebook v1.1. The Dietary Assessment Initiative.
Keywords
cuisine classification; codebook; inter-rater reliability; kappa; dietary assessment; stratification; coding manual