Nath, Tanusree, Gupta, Vedika
ORCID: https://orcid.org/0000-0002-8109-498X, Gupta, Manjari and Sharma, Rajesh
(2026)
Decoding multimodal text analytics: tasks, datasets, fusion models, and future frontiers.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 16 (2): e70083.
ISSN 1942-4787
Decoding multimodel text analysis.pdf - Published Version
Restricted to Repository staff only
Download (3MB) | Request a copy
Abstract
It is estimated that the volume of data on the digital fronts will grow exponentially to reach a volume of 180 zettabytes by 2025, and more than 90% of this data will be of unstructured forms. The unimodal to multimodal text analytics (MTA) has been triggered by this phenomenon. The early introduction of the multimodal text were observed in scholarly literature and industrial use‐cases during the early 2010s. Since then, it has greatly expanded its horizons in other sectors such as healthcare, e‐commerce, education and public safety. This survey presents a task‐oriented, modality‐inclusive, and dataset‐aware synthesis of recent advancements in MTA, which offers an in‐depth review of 10 core text analytics tasks through a multimodal lens. We systematically analyze over 160 research studies and categorize more than 120 state‐of‐the‐art models, spanning fusion strategies, representation learning, transformer architectures, and pretrained vision‐language frameworks (e.g., CLIP, ViLBERT). In a variety of datasets including CMU‐MOSI, CMU‐MOSEI, IEMOCAP, and MAViT‐Bangla, multimodal models achieve up to 18%–25% F 1‐score improvements over text‐only baselines, captured in the standardized task‐wise comparison tables that are part of this survey. Moreover, this survey discusses seven under‐explored tasks, including personality detection, satire detection, and author profiling, and elaborates gaps in research in modality fusion, diversity of data sets, and social inclusivity in these tasks. It does not only fill gaps in the current literature by unifying knowledge in different fields, but also offers researchers working on MTA a future path. It is the first survey that puts all the key tasks within multimodal text analytics into a contiguous and consistent overview compared to other surveys that either refer to multimodal computing at an administrative level or concentrate on a specific task.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Algorithmics | Emotion recognition | Fake news | Hate speech | Language model | Multi-modal | Sentiment analysis | Text analytics | Transformer architecture | Vision-language model |
| Subjects: | Physical, Life and Health Sciences > Computer Science |
| Vol/Issue no. published date: | 7 June 2026 |
| Depositing User: | Mr. Syed Anas |
| Date Deposited: | 16 Apr 2026 10:09 |
| Last Modified: | 18 Apr 2026 11:07 |
| Official URL: | https://doi.org/10.1002/widm.70083 |
| URI: | https://pure.jgu.edu.in/id/eprint/11209 |
Downloads
Downloads per month over past year
Dimensions
Dimensions