Publication Type : Conference Paper
Publisher : IEEE
Source : International Conference on Sentiment Analysis and Deep Learning (ICSADL)
Url : https://ieeexplore.ieee.org/abstract/document/10601465
Campus : Amritapuri
School : School of Computing
Year : 2024
Abstract : One of the most important tasks in combining computer vision and natural language understanding is Visual Question Answering (VQA). This paper focuses on the VQA problem in Malayalam, a language known for its unique Dravidian linguistic characteristics. The robust LXMERT model, a bidirectional encoder-decoder architecture capable of handling multimodal input, is used to achieve this goal. Extensive testing on an independent set of Malayalam images and questions highlights the feasibility of building a trustworthy VQA model on a Malayalam dataset. The method shows adaptability when answering questions about Malayalam images, which creates opportunities for image retrieval, annotation, and possibly image-based machine translation. This demonstrates how the model can be applied to real-world situations in a variety of fields.
Cite this Research Publication : Joseph, Jose, V. Anand Ram, P. YadhuKrishna, and T. Anjali. "Visual Question Answering in Malayalam Text." In 2024 3rd International Conference on Sentiment Analysis and Deep Learning (ICSADL), pp. 225-232. IEEE, 2024.