Publication Type : Conference Proceedings
Publisher : SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, p.40 - 46.
Source : SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, p.40 - 46 (2018)
Url : https://dl.acm.org/doi/10.1145/3167132.3167133
ISBN : 9781450351911
Campus : Coimbatore
School : School of Engineering
Department : Computer Science
Year : 2018
Abstract : The developments in high throughput technologies identified a large number of long non-coding RNAs (lncRNAs) whose functional characterization remains an open problem. The available research confirmed that lncRNA plays a major role in genetic and epigenetic regulation, and its expression level has a significant association with some complex diseases like cancers. The identification of lncRNA and their functional characterization is an important task in RNA Bioinformatics. In spite of their abundance in the cell, lncRNAs are less conserved at their sequence level which makes the analysis challenging. Many machine learning based models are developed in the literature for the identification and analysis of lncRNAs. This paper proposes a topic model based method for the identification of lncRNAs. To investigate the applicability of topic model in lncRNA analysis, this work develops an LDA based topic model to group lncRNAs from a collection of transcriptome sequences. The features derived from transformed k-mer patterns and secondary structure of lncRNA sequences are used for the topic model. The results are promising compared to the classic algorithms and prove that the topic models are reasonable for lncRNA analysis.
Cite this Research Publication : Manu Madhavan and Gopakumar G., “A tf-idf based topic model for identifying lncRNAs from genomic background”, SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing. pp. 40 - 46, 2018.