Publication Type : Conference Paper
Publisher : 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
Source : 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (2017)
Keywords : 5-fold cross validation, Bag-of-Words, CDMC 2016 e-News categorization task, character level based LSTM models, character level input, Cryptography, deep encrypted text categorization, dense activation layer, dense word vectors, dense word-vectors, document modeling, encrypted texts, Feature extraction, learning (artificial intelligence), long short-term memory (LSTM), long-range temporal context, low-level textual representations, LSTM network structures, Machine learning, nonlinear activation function, optimal network architecture, Pattern classification, recurrent layers, recurrent neural nets, Recurrent neural network (RNN), Recurrent neural networks, Semantics, sentence, short-term memory, text analysis, Text categorization, text categorization: encrypted, word embedding's, word level LSTM models
Campus : Coimbatore
School : School of Engineering
Center : Computational Engineering and Networking, Electronics Communication and Instrumentation Forum (ECIF)
Department : Computer Science, Electronics and Communication, Sciences
Year : 2017
Abstract : Long short-term memory (LSTM) is a significant approach to capture the long-range temporal context in sequences of arbitrary length. This had shown astonishing performance in sentence and document modeling. To leverage this, we use LSTM network to the encrypted text categorization at character and word level of texts. These texts are transformed in to dense word-vectors by using bag-of-words embedding. Dense word vectors are fed in to recurrent layers to capture the contextual information and followed by dense and activation layer with nonlinear activation function such as softmax for classification. The optimal network architecture has found by conducting various experiments with varying network parameters and network structures. All the experiments are run up to 1000 epochs with learning rate in the range [0.01-0.5]. Most of the LSTM network structures substantially performed well in 5-fold cross-validation. Based on the 5-fold cross-validation results, we claim that the character level inputs are more efficient in dealing with the encrypted texts in comparison to word level, due to the fact that character level input keeps more information from low-level textual representations. Character level based LSTM models achieved highest accuracy as 0.99 and the word level achieved highest accuracy as 0.94 in the classification settings of 5-fold cross validation using LSTM networks. On the real-world test data of CDMC 2016 e-News categorization task, word level LSTM models attained its highest accuracy as 0.43.
Cite this Research Publication : R. Vinayakumar, Dr. Soman K. P., and Poornachandran, P., “Deep Encrypted Text Categorization”, in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017.