Konferans bildirisi Açık Erişim
İsmail Buğra Bölükbaşı;
Betül Yağmahan
<?xml version='1.0' encoding='utf-8'?> <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:creator>İsmail Buğra Bölükbaşı</dc:creator> <dc:creator>Betül Yağmahan</dc:creator> <dc:date>2022-10-22</dc:date> <dc:description>In recent years, the number of people with diabetes has been increasing daily. Diabetes is an important disease that can cause serious damage to the body in the future and even cause death if precautions are not taken. Early and accurate detection of ever-increasing diabetes is gaining more importance in the medical world. The number of studies using machine learning methods to diagnose diabetes has increased significantly in the literature. In this study, type-2 diabetes disease was classified using different data preprocessing and machine learning methods on real-world data taken from a public hospital in Turkey. Logistic regression, Naive Bayes, C4.5, and Random Forest classification models were used in the study. In the classification models, the patient's age, gender, complete blood count, biochemistry, and hormone test results were used as input variables, and the disease diagnosis made by specialist doctors was used as output variable. In total, 43 different variables were studied. When the dataset was examined, it was noticed that there was an imbalance between the classes in the target variable. In cases where there is a class imbalance, the classification models can make incorrect assignments to the classes. To eliminate the class imbalance in the data set used in the study, three different resampling methods were used: random undersampling (RUS), random oversampling (ROS), and synthetic minority oversampling (SMOTE). The performances of four different machine learning methods were compared on each of the original training dataset, random undersampled training dataset, random oversampled training dataset, and synthetic minority oversampled training dataset. A total of 16 different scenarios were studied. As a result of the analysis of all scenarios, four combinations that give the best results were determined. These are Naive Bayes working with original training dataset, Random Forest working with random undersampled training and synthetic minority oversampled training datasets, and C4.5 algorithm working with random oversampled training dataset. The algorithm that takes the first place among the four scenarios that show the best results is the Random Forest algorithm working with random undersampled training dataset.</dc:description> <dc:identifier>https://aperta.ulakbim.gov.trrecord/286136</dc:identifier> <dc:identifier>oai:aperta.ulakbim.gov.tr:286136</dc:identifier> <dc:publisher>IKSAD Publishing</dc:publisher> <dc:rights>info:eu-repo/semantics/openAccess</dc:rights> <dc:rights>http://www.opendefinition.org/licenses/cc-by-sa</dc:rights> <dc:subject>Diabetes Diagnosis</dc:subject> <dc:subject>Type-2 Diabetes</dc:subject> <dc:subject>Machine Learning</dc:subject> <dc:subject>Classification</dc:subject> <dc:subject>Imbalanced Dataset</dc:subject> <dc:subject>Resampling Methods</dc:subject> <dc:title>DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET</dc:title> <dc:type>info:eu-repo/semantics/conferencePaper</dc:type> <dc:type>publication-conferencepaper</dc:type> </oai_dc:dc>
| Tüm sürümler | Bu sürüm | |
|---|---|---|
| Görüntülenme | 0 | 0 |
| İndirme | 0 | 0 |
| Veri hacmi | 0 Bytes | 0 Bytes |
| Tekil görüntülenme | 0 | 0 |
| Tekil indirme | 0 | 0 |