VLM

Retrieval Augmented Enhanced Dual Co-Attention Framework for Target Aware Multimodal Bengali Hateful Meme Detection

We propose xDORA, an enhanced dual co-attention framework that integrates vision and multilingual text encoders for robust cross-modal representation learning, achieving strong …

avatar
Raihan Tanvir