Back close

Transformer based ensemble deep learning approach for remote sensing natural scene classification

Publication Type : Journal Article

Source : International Journal of Remote Sensing 45, no. 10 (2024): 3289-3309

Url : https://www.tandfonline.com/doi/full/10.1080/01431161.2024.2343141

Campus : Coimbatore

School : School of Artificial Intelligence

Year : 2024

Abstract : Very high resolution (VHR) remote sensing (RS) image classification is paramount for detailed Earth’s surface analysis. Feature extraction from VHR natural scenes is crucial, but it becomes a challenging task because of the overlapping edges present in images. Multiple open-source datasets exist in the literature to train robust models, and they have been benchmarked using deep learning models. However, different datasets contain different numbers of classes; a few of them could be absent in other datasets because they are independent of the other classes, and the classes are sometimes not mutually exclusive amongst the same dataset. Thus, it is very challenging to generalize a model trained on a single dataset to perform scene classification on unknown classes of multiple benchmark datasets and real-time images. Thus, this work introduces the Remote Sensing Natural Scenes 92 (RS_NS92) dataset, consisting of 36,785 images belonging to 92 classes, curated by selectively taking the union of all subclasses from five benchmark datasets. This class count is significantly higher than publicly available datasets and maintains a low-class imbalance and a comprehensive data distribution for robust model training. It also provides the remote sensing community with an extra platform to validate the performance on multiple benchmarks. Inspired by federated learning, an ensemble approach consisting of three feature extraction backbones: Vision Transformers, Swin Transformers, and ConvNeXt (termed the VSC_Ensemble model) is also introduced. This model can make extraordinary predictions across multiple datasets by finetuning weights using transfer learning. Experimental analysis with the proposed approach not only obtains a high test accuracy of 97.24% and an F1-score of 0.9587 for the 92 classes on a 90:10 split of the proposed benchmark dataset but also gets excellent results on unseen test images of other datasets, which are comparable to the state-of-the-art results.

Cite this Research Publication : Sivasubramanian, Arrun, Prashanth VR, Sowmya V, and Vinayakumar Ravi. "Transformer based ensemble deep learning approach for remote sensing natural scene classification." International Journal of Remote Sensing 45, no. 10 (2024): 3289-3309

Admissions Apply Now