Culinary Culture: A Global Exploration of Health and Diversity in Cuisine

December 1, 2025·

Mubaswira Ibnat Zidney

Anik Kumar Sannyashi

Raihan Tanvir

Faisal Muhammad Shah

· 0 min read

Preprint DOI

Abstract

Accurate classification of food by cuisine and dietary categories is pivotal for advancing personalized nutrition and intelligent recommendation systems, yet unimodal approaches often struggle with label inconsistencies and cultural diversity in recipes. This study presents a novel multimodal deep learning methodology that synergistically integrates textual ingredient semantics with visual food image features to jointly predict cuisine and diet, establishing it as a robust solution to existing limitations. We refine a dataset of 4,986 recipes by consolidating over 76 regional cuisine labels into 30 country-level classes, enhancing semantic coherence and class balance. Our proposed framework employs transformer-based encoders to distill contextual ingredient information and advanced visual encoders to extract image representations, which are fused via an optimized average projection and dropout mechanism to maximize predictive accuracy. Evaluated on the refined dataset, this multimodal approach achieves 81% accuracy for cuisine and 79% for diet, significantly surpassing text-only baselines (up to 40% cuisine accuracy) and image-only baselines (up to 25% cuisine accuracy) by 15-40%. Ablation studies underscore the efficacy of our fusion strategy in addressing noisy labels, positioning this methodology as a scalable foundation for applications in dietary assessment, smart kitchen systems, and food informatics.

Type

Conference paper

Publication

28th International Conference on Computer and Information Technology (ICCIT)

Last updated on December 1, 2025

Multimodal Learning Computer Vision Natural Language Processing Food Informatics

Authors

Raihan Tanvir

Senior Lecturer

Thoughtful by nature, driven by curiosity. Learning, unlearning, and growing—every day.

← Retrieval Augmented Enhanced Dual Co-Attention Framework for Target Aware Multimodal Bengali Hateful Meme Detection February 22, 2026

Explainable Bangla Linguistic Style Classification into Saint and Common Forms October 28, 2025 →