Back close

Image Captioning: Analyzing CNN-LSTM and Vision-GPT Models

Publication Type : Conference Paper

Publisher : IEEE

Source : IEEE 9th International Conference for Convergence in Technology (I2CT)

Url : https://ieeexplore.ieee.org/abstract/document/10543514

Campus : Amritapuri

School : School of Computing

Year : 2024

Abstract : Image captioning, which exists at the point of intersection of computer vision and natural language processing, is essential for enhancing image comprehension, allowing applications like content discovery, visual aid for the blind, and more. The hunt for more precise and reliable picture captioning models continues to be an important research goal as technology develops quickly. The two prominent image captioning techniques used in this study Image Captioning Using LSTM+CNN and Image Captioning Using VisionGPT2 are thoroughly compared. We examine these models' internal workings, assess their effectiveness, and offer insights into their advantages and disadvantages for diverse application scenarios. Convolutional neural networks (CNNs) for extracting visual features and long short-term memory (LSTM) networks for producing sequential language are combined in the LSTM+CNN model, a tried-and-true methodology. It has shown adept in creating insightful descriptions for a variety of photographs. On the other hand, VisionGPT2, a GPT-2 architectural extension, makes use of transformers and pretrained language models to provide cutting-edge outcomes in a range of natural language processing applications. We analyze the viability of each technique by taking into account elements like model complexity, training data needs, and deployment simplicity. This comprehensive comparison enlightens academics, programmers, and businesses on the ideal picture captioning solution for their particular requirements, fostering development in this area and its numerous uses.

Cite this Research Publication : Karthik, Abburi Sai, Maddala HSM Krishna Karthik, Samudrala Yashwanth, and T. Anjali. "Image Captioning: Analyzing CNN-LSTM and Vision-GPT Models." In 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), pp. 1-6. IEEE, 2024.

Admissions Apply Now