Stroke Disease Prediction Using K-Nearest Neighbor And Decision Tree Algorithms With Machine Learning Pre-Processing Techniques

Malak Roman, Ifra Naz, Muhammad Ayass Luqman, Junaid Ali4, Mian Sahib Jan, Habib Ullah Nawab

Authors

Malak Roman, Ifra Naz, Muhammad Ayass Luqman, Junaid Ali4, Mian Sahib Jan, Habib Ullah Nawab

Abstract

Medical professionals require a trustworthy prediction methodology to diagnose stroke patients’ data. A vast amount of data regarding patients and their health issues exists. In general, examining data from several perspectives and synthesising it into significant information is called data mining. (sometimes termed data or knowledge discovery). Among the investigative tools accessible for data exploration are data mining packages. Users can categorise, analyse, and summarise the links found in the data from many dimensions or perspectives. One tool for data mining is Weka. It has many machine-learning algorithms. It offers the capability of classifying our data using different algorithms. With many applications, classification is a crucial data mining approach. Data of all kinds are classified by it. In every aspect of our lives, there is classification. Classification is utilised to place separate items in programmed data into one of a predetermined number of classes or groupings. Many classification algorithms are the subject of our study in this work. Using the Waikato Environment for Knowledge Analysis, the thesis compares various categorisation methods to determine which users are suitable for using haematological data. This paper investigates the application of Decision Trees (J48) and K-Nearest Neighbor (KNN) algorithms to improve medical diagnosis in healthcare. Decision Trees, represented by the J48 algorithm and KNN, are machine-learning techniques used to analyse patient data and assist in medical decision-making. Results of decision tree and k-nearest neighbor algorithm classifiers with genetic search and Chi-Square technique” are summarised. Comparison is based on precision, accuracy, recall, f-Measure and which concluded that, in terms of accuracy, “k-nearest neighbor classifier algorithm” with Genetic Search with 97.5% accuracy. In our study, we tried to find a better and more efficient classifier to classify stroke disease using data mining techniques; researchers concluded that KNN is a better algorithm model than the Decision tree algorithm and can be well used to predict strokes in a particular patient.