LB018 - LARGE LANGUAGE MODELS ASSESSMENT OF CALORIC PROVISION OF HOSPITAL MEALS – A FEASIBILITY STUDY.

Linked sessions

LB018

LARGE LANGUAGE MODELS ASSESSMENT OF CALORIC PROVISION OF HOSPITAL MEALS – A FEASIBILITY STUDY.

P. Kutnik1,*, A. Dąbek2, M. Pasternak3

1Faculty of Medicine, Institute of Medical Sciences,, The John Paul II Catholic University of Lublin, Lublin, 2Department of Metabolic Diseases, Jagiellonian University Medical College, Krakow, 3II Department of Anesthesiology and Intensive Care, Medical University of Lublin, Lublin, Poland

 

Rationale: Proper nutrition constitutes an integral component of hospital care, exerting a significant influence on the course of treatment, recovery and overall health outcomes. Hospitals often delegate meal preparation to external contractors. A hospital dietitian has to ensure that prepared meals meet nutritional requirements. Multiple different dietary plans put a severe workload on nutritional assessment. Large language models (LLMs) are becoming increasingly popular in daily data analysis across various sectors. This study aims to determine whether LLM can accurately assess caloric provision in various hospital dietary plans.

Methods: This was a database study assessing 35 different diet hospital recipes. Four publicly available LLMs: DeepSeek-Chat (DSC), Claude 3.7 Sonnet (CS), DeepSeek-R1 (DSR), and GPT 4.1 (GPT) were used to assess caloric content of recipes. The accurate assessment was considered within the 15% error range from the recipe's original value

Results: On average each meal contained 5 ingredients (2-10). There was no superiority in the accuracy of any LMM with an accuracy of 91.4% for DSC, 85.7% for CS, 85.7% for DSR, and 91.4% for GPT ( p = 0.77). The mean inaccuracy was -1.8% (±9.56) for DSC, -0.8% (±10.11) for CS, -0.07% (±11.54) for DSR, and 3.47% (±9.22) for GPT. There was a statistically significant difference between inaccuracies of DSC and GPT results (p < 0.001) and between GPT and CS (p = 0.003). The average result contained 766 characters for DSC, 413 for CS, 4059 for DSR, and 518 for GPT ( p < 0.001). All LMM failed to assess full liquid diet caloric content accurately.

Conclusion: All evaluated LLM proved to be effective in accurately measuring caloric content in different hospital meals. The mean inaccuracies were within acceptable ranges. The length of final meal reports varied between different LLMs raising the question regarding usefulness in daily practice. There is a need for a larger study examining cost-effectiveness, time-saving, and accuracy in the assessment of other macronutrients.

Disclosure of Interest: P. Kutnik Consultant for: Nutricia, Nestle Health Science, Fresenius Kabi, A. Dąbek Consultant for: Activlab, Fresenius Kabi, Nutricia, Nestle Health Science , M. Pasternak: None declared