Konferans bildirisi Açık Erişim

DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET

İsmail Buğra Bölükbaşı; Betül Yağmahan


DataCite XML

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.48623/aperta.286136</identifier>
  <creators>
    <creator>
      <creatorName>İsmail Buğra Bölükbaşı</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-9405-0900</nameIdentifier>
      <affiliation>Yalova Üniversitesi</affiliation>
    </creator>
    <creator>
      <creatorName>Betül Yağmahan</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-1744-3062</nameIdentifier>
      <affiliation>Bursa Uludağ Üniversitesi</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Diagnosis Of Diabetes Disease Using Machine Learning Methods In An Imbalanced Diabetes Dataset</title>
  </titles>
  <publisher>Aperta</publisher>
  <publicationYear>2022</publicationYear>
  <subjects>
    <subject>Diabetes Diagnosis</subject>
    <subject>Type-2 Diabetes</subject>
    <subject>Machine Learning</subject>
    <subject>Classification</subject>
    <subject>Imbalanced Dataset</subject>
    <subject>Resampling Methods</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2022-10-22</date>
  </dates>
  <resourceType resourceTypeGeneral="Text">Conference paper</resourceType>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://aperta.ulakbim.gov.tr/record/286136</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.48623/aperta.286135</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="http://www.opendefinition.org/licenses/cc-by-sa">Creative Commons Attribution Share-Alike</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;In recent years, the number of people with diabetes has been increasing daily. Diabetes is an important&lt;br&gt;
disease that can cause serious damage to the body in the future and even cause death if precautions are&lt;br&gt;
not taken. Early and accurate detection of ever-increasing diabetes is gaining more importance in the&lt;br&gt;
medical world. The number of studies using machine learning methods to diagnose diabetes has&lt;br&gt;
increased significantly in the literature.&lt;br&gt;
In this study, type-2 diabetes disease was classified using different data preprocessing and machine&lt;br&gt;
learning methods on real-world data taken from a public hospital in Turkey. Logistic regression, Naive&lt;br&gt;
Bayes, C4.5, and Random Forest classification models were used in the study. In the classification&lt;br&gt;
models, the patient&amp;#39;s age, gender, complete blood count, biochemistry, and hormone test results were&lt;br&gt;
used as input variables, and the disease diagnosis made by specialist doctors was used as output variable.&lt;br&gt;
In total, 43 different variables were studied. When the dataset was examined, it was noticed that there&lt;br&gt;
was an imbalance between the classes in the target variable. In cases where there is a class imbalance,&lt;br&gt;
the classification models can make incorrect assignments to the classes. To eliminate the class imbalance&lt;br&gt;
in the data set used in the study, three different resampling methods were used: random undersampling&lt;br&gt;
(RUS), random oversampling (ROS), and synthetic minority oversampling (SMOTE).&lt;br&gt;
The performances of four different machine learning methods were compared on each of the original&lt;br&gt;
training dataset, random undersampled training dataset, random oversampled training dataset, and&lt;br&gt;
synthetic minority oversampled training dataset. A total of 16 different scenarios were studied.&lt;br&gt;
As a result of the analysis of all scenarios, four combinations that give the best results were determined.&lt;br&gt;
These are Naive Bayes working with original training dataset, Random Forest working with random&lt;br&gt;
undersampled training and synthetic minority oversampled training datasets, and C4.5 algorithm&lt;br&gt;
working with random oversampled training dataset. The algorithm that takes the first place among the&lt;br&gt;
four scenarios that show the best results is the Random Forest algorithm working with random&lt;br&gt;
undersampled training dataset.&lt;/p&gt;</description>
  </descriptions>
</resource>
0
0
görüntülenme
indirilme
Tüm sürümler Bu sürüm
Görüntülenme 00
İndirme 00
Veri hacmi 0 Bytes0 Bytes
Tekil görüntülenme 00
Tekil indirme 00

Alıntı yap