Programs
- M. Tech. in Automotive Engineering -
- Clinical Fellowship in Laboratory Genetics & Genomics - Fellowship
As part of the Sign Language Accessibility project under Meity, Govt. of India, a vision-based continuous sign language recognition (CSLR) framework with Deep Neural Networks is developed, which directly transcribes videos of Indian Sign Language (ISL) sentences to sequences of ordered gloss labels. Our proposed architecture adopts the 3DResNet model, pre-trained on isolated ISL videos, as the feature extractor and an 8-layer Transformer Encoder for sequence learning. This is for the first time that such a pre-training strategy is adopted for developing a sign recognition model, where the datasets of the pretraining task and the downstream fine-tuning task are highly correlated. The proposed model is iteratively trained with Connectionist Temporal Classification (CTC) loss i.e. first train the end-to-end recognition model for alignment proposal and then use the alignment proposal as strong supervisory information to directly tune the feature extraction module. This training process can run iteratively to achieve improvements in recognition performance.
Since there is no benchmark ISL CSLR dataset, we collected the Continuous ISL dataset for the FAQ services of UMANG apps – Saubhagya, Eraktkosh, IRCTC, and ESIC. It provides gloss-level annotations for the ISL FAQ videos for these apps. The data is cleaned and pre-processed before model training and evaluation. All the test cases and results reported on the dataset are detailed in this work.