Published January 1, 2023 | Version v1
Journal article Open

Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data

  • 1. Firat Univ, Fac Sci, Dept Phys, TR-23119 Elazig, Turkiye

Description

Data used in particle physics analyses have an imbalanced nature in which the events of interest are rare due to the broad background. These events can be identified from bulk by intensive computational studies including application of sophisticated analysis techniques. Classification algorithms provided by supervised machine learning (ML) approaches can be utilized to interpret skewed particle dataset as an alternative to the classic techniques even for multi particle state analysis. In this study, the ground state of the bottomonium (Upsilon (1 S)) and its excited states (Upsilon (2 S) and Upsilon (3 S)) were studied by application of multiclass classification approach based on random forest classifier (RFC) which is a novel ML approach example in particle analysis with implementation of resampling techniques for preprocessing dataset and modification of the weighting strategy. For this purpose, five widely used oversampling and two hybrid strategies, using over and under resampling together, were adjusted to RFC. Moreover, class weights applied RFC, weighted random forest (WRF), was used in the analysis. Due to the data structure, performance of the applied models was evaluated by the derivatives of confusion matrix. It is revealed that hybrid techniques implemented in RFC is suitable for handling highly imbalanced classes. G-mean and BAcc scores of upsilon states presented that with SMOTETomek strategy the model exhibited highest classification achievement, around 90%, with high sensitivity implying the success of the application on multiclass classification.

Files

bib-28fe996d-f59a-4d94-9235-0e5a115cf2e6.txt

Files (156 Bytes)

Name Size Download all
md5:bef669865d198f9b1a1e7aec52e3c386
156 Bytes Preview Download