Classification of Hepatitis Patients Using Logistic Regression and Support Vector Machines Methods
Abstract
Hepatitis is an inflammatory disease of the liver. The virus often causes hepatitis and it becomes the number one world health problem. From 2019 to 2020, there were 1.5 million new cases of hepatitis B and C infection per year. WHO (World Health Organization) aims to eliminate hepatitis by 2030. Based on this problem, it is necessary to classify which health indicators may be vulnerable to the survival of hepatitis patients. This research aims to obtain the best method for classifying hepatitis patients by comparing the logistic regression method and SVM (Support Vector Machines). The classification using logistic regression and SVM is the suitable alternative for this case because the response category is binary data. This research is quantitative research and the researcher uses the hepatitis data set obtained from the UCI repository learning machine. The hepatitis data set contains 19 predictive variables (6 continuous and 13 discrete variables). The patients are divided into two groups, living, and dead patients’ groups. The results show that the best accuracy value produced by using the logistic regression method is 79.3%, and by using the SVM method is 81.94%. Thus, the best classification result for the hepatitis data set is the holdout stratified SVM method using Kernel radians with an accuracy value of 81.94%. This result indicates that the holdout stratified SVM method using Kernel radians can classify hepatitis patients’ data.
Hepatitis adalah penyakit peradangan pada hati. Hepatitis sering disebabkan oleh virus. Hepatitis termasuk masalah kesehatan dunia. Tahun 2019 sampai dengan 2020, terdapat 1,5 juta kasus baru infeksi hepatitis B dan C per tahun. WHO (World Health Organization) bertujuan untuk menghilangkan penyakit hepatitis pada tahun 2030. Berpondasikan masalah tersebut, perlu adanya pengklasifikasian untuk mengetahui indikator kesehatan mana yang mungkin rentan terhadap kelangsungan hidup pasien hepatitis. Tujuan penelitian ini untuk mendapatkan metode terbaik dalam mengklasifikasikan pasien hepatitis dengan cara membandingkan metode regresi logistik dan SVM (Support Vector Machines). Klasifikasi menggunakan regresi logistik dan SVM merupakan alternatif yang tepat untuk kasus ini, karena kategori respon adalah data biner. Penelitian ini merupakan penelitian kuantitatif. Penelitian ini menggunakan dataset hepatitis yang diperoleh dari UCI machine learning repository. Kumpulan data hepatitis berisi 19 variabel prediksi (6 variabel kontinu dan 13 variabel diskrit). Pasien dibagi menjadi dua kelas yaitu hidup dan mati. Hasil penelitian menunjukkan bahwa nilai akurasi terbaik yang dihasilkan metode regresi logistik adalah 79.3% sementara menggunakan metode SVM adalah 81.94%. Jadi hasil klasifikasi terbaik untuk dataset hepatitis adalah metode SVM holdout stratified menggunakan kernel radian dengan akurasi sebesar 81,94%. Hasil ini mengindikasikan bahwa metode SVM holdout stratified menggunakan kernel radian dapat digunakan untuk mengklasifikasikan data pasien hepatitis.
Keywords
Full Text:
PDFReferences
Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data.
Chen, N., Lu, W., Yang, J., & Li, G. (2004). Support Vector Machine in Chemistry. Support Vector Machine in Chemistry. https://doi.org/10.1142/5589.
Chow, J. H., & Chow, C. (2006). The Encyclopedia of Hepatitis and Other Liver Diseases. 372. https://books.google.com/books/about/The_Encyclopedia_of_Hepatitis_and_Other.html?id=HfPU99jIfboC.
Edgar, T. W., & Manz, D. O. (2017). Exploratory Study. Research Methods for Cyber Security, 95–130. https://doi.org/10.1016/B978-0-12-805349-2.00004-2.
Gail, M., Krickeberg, K., Samet, J. M., Tsiatis, A., & Wong, W. (2012). Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences). http://www.springer.com/series/2848.
García, S., Luengo, J., & Herrera, F. (2015). Data Preprocessing in Data Mining. Cham, Switzerland: Springer International Publishing.
Hasanah, S., & Widjarnarko O. B. (2021). Perbandingan Metode Propensity Score Matching-Support Vector Machine dan Propensity Score Matching-Regresi Logistik Biner pada Kasus HIV/AIDS. Jurnal Ilmiah Matematika dan Ilmu Pengetahuan Alam, 18(1). https://doi.org/10.31851/sainmatika.v18i1.4925.
Hidayat, T. H. J., Ruldeviyani, Y., & Aditama, A. R. (2022). Sentiment Analysis of Twitter Data Related to Rinca Island development Using Doc2Vec and SVM and Logistic Regression as Classifier. Elsevier. https://www.sciencedirect.com/science/article/pii/S187705092102411X.
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A Practical Guide to Support Vector Classification. http://www.csie.ntu.edu.tw/~cjlin.
Huang, S., Nianguang, C. A. I., Penzuti Pacheco, P., Narandes, S., Wang, Y., & Wayne, X. U. (2018). Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genomics and Proteomics, 15(1), 41–51. International Institute of Anticancer Research. https://doi.org/10.21873/cgp.20063.
Huang, Y., Zhang, L., Lian, G., Zhan, R., Xu, R., Huang, Y., Mitra, B., Wu, J., & Luo, G. (2016). A Novel Mathematical Model to Predict Prognosis of Burnt Patients Based on Logistic Regression and Support Vector Machine. Burns, 42(2), 291-299. https://www.sciencedirect.com/science/article/pii/S0305417915002338.
Ikerionwu, C., Ugwuishiwu, C., Okpala, I., James, I., Okoronkwo, M., Nnadi, C., Orji, U., Ebem, D., & Ike, A. (2022). Application of Machine and Deep Learning Algorithms in Optical Microscopic Detection of Plasmodium: A malaria diagnostic tool for the future. Photodiagnosis and Photodynamic Therapy, 40, 103198. https://doi.org/10.1016/J.PDPDT.2022.103198.
K C, S., Bhusal, A., Gautam, D., & Rupakhety, R. (2022). Earthquake Damage and Rehabilitation Intervention Prediction Using Machine Learning. Engineering Failure Analysis, 144, 106949. https://doi.org/10.1016/J.ENGFAILANAL.2022.106949.
Kemenkes. (2022). Hepatitis Can’t Wait. http://P2p.Kemkes.Go.Id/Hepatitis-Cant-Wait/.
Kistenev, Y. v., Vrazhnov, D. A., Shnaider, E. E., & Zuhayri, H. (2022). Predictive Models for COVID-19 Detection Using Routine Blood Tests and Machine Learning. Heliyon, 8(10), e11185. https://doi.org/10.1016/J.HELIYON.2022.E11185.
Kumbure, M. M., Lohrmann, C., Luukka, P., & Porras, J. (2022). Machine Learning Techniques and Data for Stock Market Forecasting: A literature Review. Expert Systems with Applications, 197, 116659. https://doi.org/10.1016/J.ESWA.2022.116659.
Ley, C., Martin, R. K., Pareek, A., Groll, A., Seil, R., & Tischer, T. (2022). Machine Learning and Conventional Statistics: Making Sense of the Differences. Knee Surgery, Sports Traumatology, Arthroscopy, 30(3), 753–757. https://doi.org/10.1007/s00167-022-06896-6.
Martinez, A. J. (1996). Medical Microbiology. Medical Microbiology, 4th Edition, 1–9. https://www.ncbi.nlm.nih.gov/books/NBK7627/.
Narayan, Y. (2021). Direct Comparison of SVM and LR Classifier for SEMG Signal Classification Using TFD Features. Materials Today: Proceedings, 45, 3543-3546. https://www.sciencedirect.com/science/article/pii/S2214785320406972.
Novianti, F. A., & Purnami, S. W. (2012). Analisis Diagnosis Pasien Kanker Payudara Menggunakan Regresi Logistik dan Support Vector Machine (SVM) Berdasarkan Hasil Mamografi Fourina Ayu Novianti dan Santi Wulan Purnami. Jurnal Sains dan Seni ITS, 1(1), D147-D152. https://doi.org/10.12962/j23373520.v1i1.1937.
Nurlaily, D., Irhamah, Purnami, S. W., & Kuswanto, H. (2019). Support Vector Machine for Imbalanced Microarray Dataset Classification Using Ant Colony Optimization and Genetic Algorithm. AIP Conference Proceedings, 2194(1), 020076. AIP Publishing LLC. https://doi.org/10.1063/1.5139808.
Pal, M., & Mather, P. M. (2005). Support Vector Machines for Classification in Remote Sensing. International Journal of Remote Sensing, 26(5), 1007–1011. https://doi.org/10.1080/01431160512331314083.
Panesar, S. S., D’Souza, R. N., Yeh, F.-C., & Miranda, J. C. F. (2019). Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in A Small, Heterogeneous Glioma Database. World neurosurgery: 10(2), 100012. https://www.sciencedirect.com/science/article/pii/S2590139719300432.
Park, H. A. (2013). An Introduction to Logistic Regression: from Basic Concepts to Interpretation with Particular Attention to Nursing Domain. Journal of Korean Academy of Nursing, 43(2), 154–164. https://doi.org/10.4040/jkan.2013.43.2.154.
Qomariyah, S., Iriawan, N., & Fithriasari, K. (2019). Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis. AIP Conference Proceedings, 2194(1), 020093. https://doi.org/10.1063/1.5139825.
Raman, G. (2022). Identifying Extra-Large Pore Structures in Zeolites with A Machine Learning Approach and Its Deployment into Production. Microporous and Mesoporous Materials, 112362. https://doi.org/10.1016/J.MICROMESO.2022.112362.
Samosir, R. O., Wilandari, Y., & Yasin, H. (2015). Perbandingan Metode Klasifikasi Regresi Logistik Biner dan Radial Basis Function Network pada Berat Bayi Lahir Rendah (Studi Kasus: Puskesmas Pamenang Kota Jambi). (Doctoral Dissertation, FSM Universitas Diponegoro). http://ejournal-s1.undip.ac.id/index.php/gaussian.
Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science 2(3). Springer. https://doi.org/10.1007/s42979-021-00592-x.
Shihong, Y., Ping, L., & Peiyi, H. (2003). SVM Classification: Its Contents and Challenges. Applied Mathematics-A Journal of Chinese Universities, 18(3), 332–342. https://doi.org/10.1007/S11766-003-0059-5.
Utami, I. T. (2018). Perbandingan Kinerja Klasifikasi Support Vector Machine (SVM). Dan Regresi Logistik Biner Dalam Mengklasifikasikan Ketepatan Waktu Kelulusan Mahasiswa Fmipa Untad. Jurnal Ilmiah Matematika dan Terapan, 15(2), 256-267.
WHO. (2021). Global Progress Report on HIV, Viral Hepatitis and Sexually Transmitted Infections, 2021-Data slides.
WHO. (2022). World Hepatitis Day 2022. https://www.Who.Int/Indonesia/News/Campaign/World-Hepatitis-Day/2022.
Wilkinson, J., Mamas, M. A., & Kontopantelis, E. (2022). Logistic Regression Frequently Outperformed Propensity Score Methods, Especially for Large Datasets: A Simulation Study. Journal of Clinical Epidemiology, 152, 176–184. https://doi.org/10.1016/J.JCLINEPI.2022.09.009.
Zuckerman, A. J., & Baron, S. (1996). Hepatitis Viruses - PubMed. Retrieved November 21, 2022, from https://pubmed.ncbi.nlm.nih.gov/21413272/.
DOI: http://dx.doi.org/10.21043/jpmk.v5i2.17052
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Editorial and Administration Office:
Jurnal Pendidikan Matematika (Kudus)
Tadris Matematika, Tarbiyah Faculty, Institut Agama Islam Negeri Kudus
Jl. Conge Ngembalrejo Po Box 51, Kudus, Jawa Tengah, Indonesia, Kode Pos: 59322
Email: [email protected]