Comparison of Classification Algorithms using the Orange Data Mining Tool
Contenu principal de l'article
Résumé
Data mining techniques help to find hidden knowledge within disease datasets that can be used to analyze and predict future disease behavior, various techniques and algorithms are available for data mining. The classification is the most common technique for extracting mining rules from huge datasets. The algorithms: Decision Tree, Logistic Regression, Neural Network, Naive Bayes, and K-Nearest Neighbor will be presented in this paper, along with comparison among these algorithms, by using Orange data mining tool to classify medical data for back and neck pain prediction. The result in the training phase: The Area Under the Curve of Decision Tree 0.983, Neural Network 0.844 and K-Nearest Neighbor 0.839. The highest Precision and Recall were achieved with the Decision Tree algorithm; 0.930, 0.927.respectivly. In the prediction phase: The Calcification Accuracy, Area Under the Curve, accuracy, and Recall of Naive Bayes algorithm (0.767, 0.686, 0.733, 0.765) respectively, and the Calcification Accuracy of Logistic Regression was 0.750. In the training phase the better performance was obtained by Decision Tree, K-Nearest Neighbor and Neural Network, whereas the lowest performance was noted by Naive Bayes. In the prediction phase the best performance was by Naive Bayes, while less performance was noticeable by Logistic Regression.