Multimodal Learning

Retrieval Augmented Enhanced Dual Co-Attention Framework for Target Aware Multimodal Bengali Hateful Meme Detection

We propose xDORA, an enhanced dual co-attention framework that integrates vision and multilingual text encoders for robust cross-modal representation learning, achieving strong …

avatar
Raihan Tanvir

Culinary Culture: A Global Exploration of Health and Diversity in Cuisine

The paper introduces a multimodal deep learning framework combining textual ingredient semantics and visual food image features to classify cuisine and diet, achieving 81% cuisine …

mubaswira-ibnat-zidney