Publication Type : Conference Paper
Publisher : IEEE
Source : International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 1370-1374, IEEE, 2018.
Url : https://ieeexplore.ieee.org/abstract/document/8597200
Campus : Coimbatore
School : School of Computing
Year : 2018
Abstract : Web robots are automated software agents primarily used for web searching and indexing. Nowadays, Web robots are frequently used for performing malicious (spamming and spying, etc.) activities on the internet because of their camouflage behavior. In web server logs it is difficult to identify the HTTP requests generated by these automated traverses due to circumventing identity. Unsupervised clustering methods may be useful for categorizing the HTTP user sessions into web robot and human sessions. In this paper, three clustering algorithms such as clustering large application (CLARA), ordering points to identify clustering (OPTICS) and balanced iterative reducing and clustering using hierarchy (BIRCH) are used to cluster the session data. The used clustering algorithms are considered from different categories such as Partition-Based, density-based and hierarchy-based respectively. The used algorithms are implemented in ELKI and JBIRCH open source libraries and applied on publicly available user session data. The comparative clustering performance of algorithms is done using cluster validity measures including Rand Index, Jaccard Index, and F-measure. The effective time taken by each measure for clustering web robot sessions and distinguishing from other three classes is also measured.
Cite this Research Publication : Dilip Singh Sisodia, Radhika Khandelwal, Arti Anuragi. " Categorization performance of unsupervised learning techniques for web robots sessions. " International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 1370-1374, IEEE, 2018.