Do Memes Speak Hate? A Residual-Adapter Approach for Bengali Memes

Dimensions

Nath, Tanusree, Gupta, Vedika ORCID: https://orcid.org/0000-0002-8109-498X and Gupta, Manjari (2026) Do Memes Speak Hate? A Residual-Adapter Approach for Bengali Memes. In: Intelligent Human Computer Interaction: 17th International Conference, IHCI 2025, Jaipur, India, November 14–16, 2025, Revised Selected Papers, Part I Conference proceedings. Lecture Notes in Computer Science . Springer Science and Business Media, Berlin, pp. 225-237. ISBN 9783032263483 Available at: https://doi.org/10.1007/978-3-032-26349-0_19

Full text not available from this repository. (Request a copy)

Abstract

The widespread presence of social media in everyday life has led to a surge in online interactions. While it offers numerous benefits, social media is frequently misused as a platform to propagate hate and negativity, which can have detrimental effects on users’ mental well-being. Although several tools exist to detect harmful content, low-resource languages like Bengali remain significantly under-resourced in such efforts. Given that contemporary social media content often spans multiple modalities like text and images, a multimodal approach offers a more robust solution for automatic content moderation. This study presents a multimodal hateful meme detection model tailored for Bengali, which simultaneously processes textual and visual information. The Res-AFFNet (Residual-Adapter Feature-level Fusion Network) architecture employs MuRIL (Multilingual representations for Indian languages) and ViT (Vision Transformer) to process text and image, respectively. Each modality’s embeddings are passed through lightweight adapter units, and a residual fusion (80% original +20% adapted) is applied. The adapter outputs are fused and passed through a linear projection and classifier to predict whether the input is ‘hate’ or ‘not-hate’. The multimodal approach outperforms unimodal approaches in terms of accuracy. Trained and evaluated on the MUTE dataset, the model achieves an improvement of 1.27% over the existing baseline. Additionally, ablation studies highlight the effectiveness of the multimodal framework compared to unimodal and reduced-component variants. Multimodal Bengali hate speech detection can be used for automatic content moderation on social networking websites and help support low-resource communities. One future scope of this study is that this framework could be extended to other low-resource languages. Fine-grained classification of hate speech can also be incorporated in this framework by expanding the dataset annotations.

Item Type:	Book Section
Uncontrolled Keywords:	Bengali Hateful Memes \| Low-Resource Language NLP \| Multimodal Hate Speech Detection
Subjects:	Physical, Life and Health Sciences > Computer Science
Depositing User:	Mr. Syed Anas Ali
Date Deposited:	01 Jul 2026 04:47
Last Modified:	01 Jul 2026 04:47
Official URL:	https://doi.org/10.1007/978-3-032-26349-0_19
URI:	https://pure.jgu.edu.in/id/eprint/11880

Downloads

Downloads per month over past year

Actions (login required)

: View Item