Mixed-dish portion estimation: the unsolved problem at the centre of consumer dietary assessment

Meera Patel

doi:10.5281/zenodo.dai-2025-08

Narrative Review

Mixed-dish portion estimation: the unsolved problem at the centre of consumer dietary assessment

DAI-NR-2025-08

Meera Patel, PhD
Published December 3, 2025 · DOI: 10.5281/zenodo.dai-2025-08

Abstract

Portion estimation for mixed dishes — stews, curries, casseroles, stir-fries, composite salads, and similar preparations whose ingredients are not separable on visual inspection — is the single source of error that most consistently dominates accuracy estimates in image-based dietary assessment. Across the independent validation literature reviewed in the Initiative's 2025 systematic review, mixed-dish MAPE was typically 1.5-3× that of single-item MAPE within the same study. This narrative review synthesises the evidence on mixed-dish portion error, distinguishes the three principal sources of error (ingredient identification, ingredient proportion estimation, and total volume estimation), and describes the methodological approaches that have been tried (multi-view photography, depth sensing, reference-object scaling, user-confirmed ingredient lists, recipe matching to menu corpora, and hybrid approaches combining image-based and manual-entry data). The review concludes that mixed-dish estimation is not a solved problem and is unlikely to be solved by image analysis alone. Approaches that integrate image inputs with structured manual confirmation of ingredient identity and proportion — while preserving the user-experience advantages of image-based capture — appear the most promising direction. The review calls for a shared mixed-dish benchmark with per-ingredient ground truth, for validation studies to report mixed-dish MAPE separately from single-item MAPE, and for clinical applications relying on mixed-dish estimates to be designed around the error budgets the field currently supports.

Keywords: mixed dish; portion estimation; image-based dietary assessment; narrative review; error decomposition; ingredient identification

1. Background

A lentil stew on a plate is not a plate of lentils. From a photograph of the stew alone, a dietary assessment tool must infer: which ingredients are present (lentils, onions, tomato, oil, spice, possibly meat, possibly coconut milk), in what proportions, and what the total volume is. Each inference is difficult in isolation; together they compound. In the independent validation literature, mixed-dish MAPE is typically 1.5 to 3 times single-item MAPE in the same study, and in two studies mixed-dish MAPE exceeded 40% while single-item MAPE in the same study was under 15%.

The purpose of this review is to state plainly that mixed-dish estimation is not a solved problem, to decompose the sources of error, and to summarise the methodological approaches that have been tried and their limits.

2. The Argument

Mixed-dish portion estimation is the binding constraint on consumer image-based dietary assessment accuracy for a substantial fraction of real meals. A tool that performs well on single-item images (a whole apple, a steak, a portion of rice) will see its pooled accuracy degrade as the proportion of mixed dishes in real use rises. Real consumer diets — particularly in South Asian, Middle Eastern, Latin American, West African, and several European culinary traditions — are mixed-dish-dominant. A tool claiming category-leading accuracy on single-item benchmarks does not, by construction, claim category-leading accuracy in those use contexts.

Two related points follow. First, mixed-dish MAPE should be reported separately from single-item MAPE in any validation study. Second, consumer tools aiming at global use cannot rely on image inputs alone; a structured mechanism for user confirmation of ingredient identity and proportion is essential, with image inputs serving to reduce rather than eliminate the user’s cognitive load.

3. Evidence Considered

3.1 Decomposition of error

Mixed-dish error decomposes into three principal sources, which can be analysed separately:

Source	Typical contribution (95% CI, where reported)
Ingredient identification (wrong ingredient identified)	6-14% of MAPE
Ingredient proportion estimation (right ingredient, wrong share)	8-22% of MAPE
Total volume estimation (correct identification and proportions, wrong total)	5-17% of MAPE
Residual (interaction effects)	3-9% of MAPE

The contributions are not additive; they interact, and the decomposition is approximate. Across the studies reviewed, ingredient-proportion estimation is the single largest source, and the one least tractable by image analysis alone.

3.2 Methodological approaches tried

Multi-view photography. Asking the user for multiple views (top, side, 45°) improves volume estimation modestly but does little for ingredient proportion in opaque dishes.

Depth sensing. LiDAR and structured-light depth (available on some smartphones) improves volume estimation when the dish is on a flat plate; bowls and composite dishes remain difficult.

Reference-object scaling. Including a known-size object (coin, credit card) in the frame reduces scale ambiguity. The approach is under-used by consumers and awkward in real meal contexts.

User-confirmed ingredient lists. A tool proposes an ingredient list from the image; the user confirms or edits. This approach reliably reduces error but transfers cognitive load to the user.

Recipe matching to menu corpora. For restaurant meals, matching the image to a known menu item (with known ingredients and portion norms) can materially reduce error. Coverage is the limit: menu corpora exist for major chains but not for independent restaurants or home-cooked meals.

Hybrid approaches. Combining image-based identification with structured manual confirmation (barcode scan of an ingredient, database entry for a chain item, text entry for home recipes) consistently performs best in the small number of head-to-head comparisons published. The user-experience cost is the key design challenge.

3.3 The benchmark gap

No publicly shared benchmark for mixed-dish estimation with per-ingredient ground truth is available to the field. The Initiative has advocated the construction of such a benchmark; at the time of writing none exists. This absence is itself a reason why mixed-dish MAPE estimates vary widely across studies — the underlying difficulty of the evaluation set is not standardised.

4. Implications

4.1 For validation studies

Mixed-dish MAPE should be reported separately from single-item MAPE. Pooled MAPE obscures the fact that the two sub-populations of meals have materially different error characteristics.

4.2 For tool design

Consumer tools intended for use in mixed-dish-dominant diets require a user-confirmation pathway. The protocol paper on weighed-food reference meal construction (DAI-MP-2025-07) provides a reference for the level of ingredient decomposition required to establish ground truth; no image-only approach currently approaches that level of fidelity.

4.3 For clinical applications

Clinical use cases that depend on mixed-dish estimates (diabetes self-management, post-bariatric follow-up, inflammatory bowel disease elimination protocols) must be designed around the error budgets the field currently supports — typically wide intervals on mixed-dish estimates. Tools should disclose their mixed-dish MAPE explicitly when marketed for clinical use.

5. Limits of this Position

The review is restricted to the published independent validation literature, which is sparse and heterogeneous. The three-way error decomposition is approximate and has been applied consistently in only a handful of studies. The review does not consider assessment modalities outside image-based methods — text-based recall, voice-based assessment, wearable-based proxies — which may have different mixed-dish error structures and which may be complementary rather than substitutive. The review takes no position on any specific commercial product.

References

Anand V, Qureshi S. Ingredient-level ground truth for mixed-dish dietary assessment. Nutrients. 2024;16(14):2215.
Barrera L, Ikeda H. Depth sensing in smartphone-based dietary assessment. IEEE Pervasive Comput. 2023;22(3):34-42.
Chen L, Hwang M. Multi-view photography for portion estimation: limits and extensions. Comput Biol Med. 2022;147:105714.
Dickerson N, Patel R. Error decomposition in mixed-dish estimation. Br J Nutr. 2024;132(4):612-624.
Ebrahimi S, Weiss W. User-confirmed ingredient lists: a hybrid design for dietary assessment. JMIR mHealth Uhealth. 2023;11:e50212.
Frey M, Okafor C. Menu-corpus matching for restaurant dietary assessment. Nutrients. 2024;16(4):578.
Guevara R, Thomas F. Reference-object scaling in consumer food photography. Appetite. 2022;175:106064.
Initiative of Dietary Assessment. Independent validation of image-based dietary assessment applications. DAI-SR-2025-06. 2025.
Initiative of Dietary Assessment. A protocol for weighed-food reference meal construction. DAI-MP-2025-07. 2025.
Jansen K, Okonkwo A. Mixed-dish portion estimation across five cuisine families. Public Health Nutr. 2024;27(9):1922-1933.
Kwon H, Rivera M. Cognitive load in user-confirmation workflows for dietary apps. JMIR Hum Factors. 2023;10(2):e42876.
Li Q, Stephens T. Volume estimation from single and multiple photographs: a benchmark. Nutrients. 2023;15(18):4003.
Morgan A, Sherry T. The persistence of ingredient-proportion error across image-analysis pipelines. J Acad Nutr Diet. 2024;124(7):1210-1221.

Funding

No external funding was received for this work.

Competing interests

The authors declare no competing interests.

How to cite

Patel M.. (2025). Mixed-dish portion estimation: the unsolved problem at the centre of consumer dietary assessment. The Dietary Assessment Initiative — Research Publications. https://doi.org/10.5281/zenodo.dai-2025-08

License

This article is distributed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).