Culinary Culture: A Global Exploration of Health and Diversity in Cuisine

May 6, 2026·

Mubaswira Ibnat Zidney

Anik Kumar Sannyashi

Raihan Tanvir

Faisal Muhammad Shah

Abstract

Accurate classification of food by cuisine and dietary categories is pivotal for advancing personalized nutrition and intelligent recommendation systems, yet unimodal approaches often struggle with label inconsistencies and cultural diversity in recipes. This study presents a novel multimodal deep learning methodology that synergistically integrates textual ingredient semantics with visual food image features to jointly predict cuisine and diet, establishing it as a robust solution to existing limitations. We refine a dataset of 4,986 recipes by consolidating over 76 regional cuisine labels into 30 country-level classes, enhancing semantic coherence and class balance. Our proposed framework employs transformer-based encoders to distill contextual ingredient information and advanced visual encoders to extract image representations, which are fused via an optimized average projection and dropout mechanism to maximize predictive accuracy. Evaluated on the refined dataset, this multimodal approach achieves 81% accuracy for cuisine and 79% for diet, significantly surpassing text-only baselines (up to 40% cuisine accuracy) and image-only baselines (up to 25% cuisine accuracy) by 15−40%. Ablation studies underscore the efficacy of our fusion strategy in addressing noisy labels, positioning this methodology as a scalable foundation for applications in dietary assessment, smart kitchen systems, and food informatics.

Type

Conference paper

Publication

2025 28th International Conference on Computer and Information Technology (ICCIT)

Food Informatics Multimodal Learning Nutrition

Authors

Raihan Tanvir

Researcher | Educator | Lifelong Learner

Explainable Bangla Linguistic Style Classification into Saint and Common Forms: A Deep Neural Network and Transformer-Based Approach with LIME-Based Interpretability Oct 28, 2025 →