Publication Type : Conference Proceedings
Publisher : IEEE
Source : International Conference on Intelligent Computing and Control Systems (ICCS)
Url : https://ieeexplore.ieee.org/abstract/document/9065563
Campus : Amritapuri
School : School of Computing
Year : 2019
Abstract : While there are many factors which could contribute to the occurrence of breast cancer, it is very difficult to attribute the exact environmental and other factors contributing to it, but still it has significance in determining the occurrence of cancer. Using machine learning techniques and regular diagnosis information, we can achieve our goal of assessing the risk of occurrence of breast cancer. Cancer data sets contain many attributes of patient information, but not every feature is relevant in predicting cancer. Feature selection techniques are useful in such scenarios for retaining the relevant feature set. In this paper we are doing a comparative study of the effect of feature selection techniques on the accuracies given by existing machine learning algorithms. For this purpose we have considered the following machine learning algorithms - Logistic Regression, Naive Bayes and Random Forest. The following feature selection techniques have been considered - Sequential Forward Feature Selection, Recursive Feature Elimination, f-test and correlation.The publicly available Breast Cancer Wisconsin (Diagnostic) Data Sets from UCI Repository have been used in this work. The results show that random forest algorithm gives the highest accuracy with feature selection. Furthermore f-test gives better results for the smaller dataset and Sequential Forward Selection for the larger dataset.
Cite this Research Publication : R. Dhanya, Irene Rose Paul, Sai Sindhu Akula,Madhumathi Sivakumar, Jyothisha J. Nair, A Comparative Study for Breast Cancer Prediction using Machine Learning and Feature Selection, 2019 International Conference on Intelligent Computing and Control Systems (ICCS),2019.