Konferans bildirisi Açık Erişim

DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET

İsmail Buğra Bölükbaşı; Betül Yağmahan


JSON-LD (schema.org)

{
  "@context": "https://schema.org/", 
  "@id": 286136, 
  "@type": "ScholarlyArticle", 
  "creator": [
    {
      "@id": "https://orcid.org/0000-0002-9405-0900", 
      "@type": "Person", 
      "affiliation": "Yalova \u00dcniversitesi", 
      "name": "\u0130smail Bu\u011fra B\u00f6l\u00fckba\u015f\u0131"
    }, 
    {
      "@id": "https://orcid.org/0000-0003-1744-3062", 
      "@type": "Person", 
      "affiliation": "Bursa Uluda\u011f \u00dcniversitesi", 
      "name": "Bet\u00fcl Ya\u011fmahan"
    }
  ], 
  "datePublished": "2022-10-22", 
  "description": "<p>In recent years, the number of people with diabetes has been increasing daily. Diabetes is an important<br>\ndisease that can cause serious damage to the body in the future and even cause death if precautions are<br>\nnot taken. Early and accurate detection of ever-increasing diabetes is gaining more importance in the<br>\nmedical world. The number of studies using machine learning methods to diagnose diabetes has<br>\nincreased significantly in the literature.<br>\nIn this study, type-2 diabetes disease was classified using different data preprocessing and machine<br>\nlearning methods on real-world data taken from a public hospital in Turkey. Logistic regression, Naive<br>\nBayes, C4.5, and Random Forest classification models were used in the study. In the classification<br>\nmodels, the patient&#39;s age, gender, complete blood count, biochemistry, and hormone test results were<br>\nused as input variables, and the disease diagnosis made by specialist doctors was used as output variable.<br>\nIn total, 43 different variables were studied. When the dataset was examined, it was noticed that there<br>\nwas an imbalance between the classes in the target variable. In cases where there is a class imbalance,<br>\nthe classification models can make incorrect assignments to the classes. To eliminate the class imbalance<br>\nin the data set used in the study, three different resampling methods were used: random undersampling<br>\n(RUS), random oversampling (ROS), and synthetic minority oversampling (SMOTE).<br>\nThe performances of four different machine learning methods were compared on each of the original<br>\ntraining dataset, random undersampled training dataset, random oversampled training dataset, and<br>\nsynthetic minority oversampled training dataset. A total of 16 different scenarios were studied.<br>\nAs a result of the analysis of all scenarios, four combinations that give the best results were determined.<br>\nThese are Naive Bayes working with original training dataset, Random Forest working with random<br>\nundersampled training and synthetic minority oversampled training datasets, and C4.5 algorithm<br>\nworking with random oversampled training dataset. The algorithm that takes the first place among the<br>\nfour scenarios that show the best results is the Random Forest algorithm working with random<br>\nundersampled training dataset.</p>", 
  "headline": "DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET", 
  "identifier": 286136, 
  "image": "https://aperta.ulakbim.gov.tr/static/img/logo/aperta_logo_with_icon.svg", 
  "keywords": [
    "Diabetes Diagnosis", 
    "Type-2 Diabetes", 
    "Machine Learning", 
    "Classification", 
    "Imbalanced Dataset", 
    "Resampling Methods"
  ], 
  "license": "http://www.opendefinition.org/licenses/cc-by-sa", 
  "name": "DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET", 
  "url": "https://aperta.ulakbim.gov.tr/record/286136"
}
0
0
görüntülenme
indirilme
Tüm sürümler Bu sürüm
Görüntülenme 00
İndirme 00
Veri hacmi 0 Bytes0 Bytes
Tekil görüntülenme 00
Tekil indirme 00

Alıntı yap