DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET

İsmail Buğra Bölükbaşı; Betül Yağmahan

doi:10.48623/aperta.286136

Konferans bildirisi Açık Erişim

DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET

2022 İsmail Buğra Bölükbaşı; Betül Yağmahan

In recent years, the number of people with diabetes has been increasing daily. Diabetes is an important
disease that can cause serious damage to the body in the future and even cause death if precautions are
not taken. Early and accurate detection of ever-increasing diabetes is gaining more importance in the
medical world. The number of studies using machine learning methods to diagnose diabetes has
increased significantly in the literature.
In this study, type-2 diabetes disease was classified using different data preprocessing and machine
learning methods on real-world data taken from a public hospital in Turkey. Logistic regression, Naive
Bayes, C4.5, and Random Forest classification models were used in the study. In the classification
models, the patient's age, gender, complete blood count, biochemistry, and hormone test results were
used as input variables, and the disease diagnosis made by specialist doctors was used as output variable.
In total, 43 different variables were studied. When the dataset was examined, it was noticed that there
was an imbalance between the classes in the target variable. In cases where there is a class imbalance,
the classification models can make incorrect assignments to the classes. To eliminate the class imbalance
in the data set used in the study, three different resampling methods were used: random undersampling
(RUS), random oversampling (ROS), and synthetic minority oversampling (SMOTE).
The performances of four different machine learning methods were compared on each of the original
training dataset, random undersampled training dataset, random oversampled training dataset, and
synthetic minority oversampled training dataset. A total of 16 different scenarios were studied.
As a result of the analysis of all scenarios, four combinations that give the best results were determined.
These are Naive Bayes working with original training dataset, Random Forest working with random
undersampled training and synthetic minority oversampled training datasets, and C4.5 algorithm
working with random oversampled training dataset. The algorithm that takes the first place among the
four scenarios that show the best results is the Random Forest algorithm working with random
undersampled training dataset.

Önizleme

Dosyalar (270.0 kB)

Dosya adı	Boyutu
DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET.pdf md5:7263bbe549773ced2254b23b669f0165	270.0 kB	İndir

görüntülenme

indirilme

Daha fazla ayrıntı...

	Tüm sürümler	Bu sürüm
Görüntülenme	0	0
İndirme	0	0
Veri hacmi	0 Bytes	0 Bytes
Tekil görüntülenme	0	0
Tekil indirme	0	0

Kayıt Bilgileri

Yayınlanma tarihi:: 22/10/2022
ISBN:: 987-625-8246-29-2
Bilim dalları:: Sağlık Bilimleri > Tıp > Dahili Tıp Bilimleri > İç Hastalıkları > Endokrinoloji ve Metabolizma Hastalıkları
Anahtar kelimeler:: Diabetes Diagnosis Type-2 Diabetes Machine Learning Classification Imbalanced Dataset Resampling Methods
Yayınlandığı yer:: ABSTRACT BOOK, IKSAD Publishing, Adana, pp. 330-331 (987-625-8246-29-2).
Konferans Bilgileri:: CUKUROVA 9th INTERNATIONAL SCIENTIFIC RESEARCHES CONFERENCE, Adana, October 09-11, 2022
Lisans:: Creative Commons Attribution Share-Alike

Sürümler

Sürüm 1

22/10/2022

Belirli bir sürüme mi atıf vermek istiyorsunuz?

Gösterilen DOI numarası tüm sürümleri temsil eder ancak son sürümü çözer. Bu nedenle belirli bir sürüme atıf vermek için sürüm numarasının belirtilmesi gerekmektedir.

DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET

DIAGNOSIS OF DIABETES DISEASE USING MACHINE LEARNING METHODS IN AN IMBALANCED DIABETES DATASET

Kayıt Bilgileri

Sürümler

Alıntı yap

Paylaş

Dışa aktar

TÜBİTAK ULAKBİM

İLETİŞİM