Potential Pathological, Clinical, and Symptomatic Findings of COVID-19 to Predict Mortality in Positive PCR Individuals Using Data Mining

Document Type : Original Article


1 Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran.

2 Institute of Applied Health Sciences, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen.

3 Assistant Professor in Biostatistics, Expert Management and Information Technology, Mashhad University of Medical Sciences, Mashhad, Iran.

4 Center of Statistics and Information Technology Management, Imam Reza Hospital, Mashhad University of Medical Sciences, Mashhad, Iran.

5 Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.


COVID-19 has placed immense burdens on healthcare systems and medical staff. To avoid spread, the statistician’s role and the use of appropriate predictive models -prediction of survivors versus non-survivors- is highly relevant. This study aimed to apply a model which avoids overfitting and selection bias towards selecting predictors to predict COVID-19 mortality.
Materials and Methods:
The Conditional Inference Tree (CIT) model was used. Data from 59,564 hospitalized individuals with positive polymerase chain reaction (PCR) test results were collected from February 20, 2020, to September 12, 2021, in the Khorasan Razavi province, Iran.
The sensitivity and specificity of the model were 88.7% and 88.1%, respectively, the accuracy was 88.2%, and the area under the curve (AUC) was 73.0% on test data. Therefore, the model had considerable accuracy in prediction. The potential predictors involved in predicting survivors versus non-survivors were intubation, age, PO2 level, decreased consciousness level, presence of distress, anorexia, drug use, and kidney diseases.
According to the findings, the CIT model showed high accuracy by avoiding overfitting and selection bias toward selecting predictors. Thus, the results of this study and the efforts of healthcare systems to stop the spread of this pandemic prove helpful.


Main Subjects

  1. Talkhi N, Akhavan Fatemi N, Ataei Z, Jabbari Nooghabi M. Modeling and forecasting number of confirmed and death caused COVID-19 in IRAN: A comparison of time series forecasting methods. Biomedical Signal Processing and Control. 2021; 66:102494.
  2. Clari M, Luciani M, Conti A, Sciannameo V, Berchialla P, Di Giulio P, et al. The Impact of the COVID-19 Pandemic on Nursing Care: A Cross-Sectional Survey-Based Study. 2021;11(10):945.
  3. Kyriazos T, Galanakis M, Karakasidou E, Stalikas A. Early COVID-19 quarantine: A machine learning approach to model what differentiated the top 25% well-being scorers. Personality and Individual Differences. 2021; 181:110980.
  4. Abdi M. Coronavirus disease 2019 (COVID-19) outbreak in Iran: Actions and problems. Infection Control & Hospital Epidemiology. 2020; 41(6):754-5.
  5. Mohammadi F, Pourzamani H, Karimi H, Mohammadi M, Mohammadi M, Ardalan N, et al. Artificial neural network and logistic regression modelling to characterize COVID-19 infected patients in local areas of Iran. Biomedical journal. 2021.
  6. Chen Y-J, Jian W-H, Liang Z-Y, Guan W-J, Liang W-H, Chen R-C, et al. Earlier diagnosis improves COVID-19 prognosis: a nationwide retrospective cohort analysis. Annals of Translational Medicine. 2021;9(11).
  7. Nasr Esfahani BN, Ahadi AM, Shalibeik S. A Review of Novel Coronavirus, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Iranian Journal of Medical Microbiology. 2020;14(2):154-61.
  8. Ballı S. Data analysis of Covid-19 pandemic and short-term cumulative case forecasting using machine learning time series methods. Chaos, Solitons & Fractals. 2021;142:110512.
  9. Rinivas MLY, Liao HYM. Deep dictionary learning for fine-grained image classification. International Conference on Image Processing 2017:17-20.
  10. Sánchez-Montañés M, Rodríguez-Belenguer P, Serrano-López AJ, Soria-Olivas E, Alakhdar-Mohmara Y. Machine learning for mortality analysis in patients with COVID-19. International Journal of Environmental Research and Public Health. 2020;17(22):8386.
  11. Alballa N, Al-Turaiki I. Machine learning approaches in COVID-19 diagnosis, mortality, and severity risk prediction: A review. Informatics in Medicine Unlocked. 2021:100564.
  12. Venturini S, Orso D, Cugini F, Crapis M, Fossati S, Callegari A, et al. Classification and analysis of outcome predictors in non‐critically ill COVID‐19 patients. Internal Medicine Journal. 2021; 51(4): 506-14.
  13. Malik M, Iqbal MW, Shahzad SK, Mushtaq MT, Naqvi MR, Kamran M, et al. Determination of COVID-19 patients using machine learning algorithms. Intelligent Automation Soft Computing. 2022:207-22.
  14. Huyut MT, Üstündağ H. Prediction of diagnosis and prognosis of COVID-19 disease by blood gas parameters using decision trees machine learning model: a retrospective observational study. Med Gas Res. 2022; 12(2): 60-6.
  15. Van Pelt A, Glick HA, Yang W, Rubin D, Feldman M, Kimmel SE. Evaluation of COVID-19 Testing Strategies for Repopulating College and University Campuses: A Decision Tree Analysis. Journal of Adolescent Health. 2021;68(1):28-34.
  16. Rochmawati N, Hidayati HB, Yamasari Y, Yustanti W, Rakhmawati L, Tjahyaningtijas HPA, et al., editors. Covid Symptom Severity Using Decision Tree. 2020 Third International Conference on Vocational Education and Electrical Engineering (ICVEE); 2020 3-4 Oct. 2020.
  17. Zimmerman RK, Nowalk MP, Bear T, Taber R, Clarke KS, Sax TM, et al. Proposed clinical indicators for efficient screening and testing for COVID-19 infection using Classification and Regression Trees (CART) analysis. Human Vaccines Immunotherapeutics. 2021; 17(4): 1109-12.
  18. Mesenburg MA, Hallal PC, Menezes AMB, Barros AJ, Horta BL, Hartwig FP, et al. Prevalence of symptoms of COVID-19 in the state of Rio Grande do Sul: results of a population-based study with 18,000 participants. Revista de saude publica. 2021;55:82.
  19. Guan H, Dong X, Yan G, Searls T, Bourque CP, Meng FR. Conditional inference trees in the assessment of tree mortality rates in the transitional mixed forests of Atlantic Canada. PloS one. 2021;16(6):e0250991.
  20. Sardá-Espinosa A, Subbiah S, Bartz-Beielstein T. Conditional inference trees for knowledge extraction from motor health condition data. Engineering Applications of Artificial Intelligence. 2017;62:26-37.
  21. Kaaber Rasmussen NE, Frank Hansen M, Stephensen P. Conditional inference trees in dynamic microsimulation-modelling transition probabilities in the SMILE model. Danish Rational Economic Agents Model, DREAM; 2013.
  22. Ahrazem Dfuf I, Mira McWilliams J, M,, González Fernández M, C, . Multi-output conditional inference trees applied to the electricity market: Variable importance analysis. Energies. 2019;12(6):1097.
  23. Gross K. Tree-Based Models: How They Work (In Plain English!) 2020 [Available from: https:// blog.dataiku.com/tree-based-models-how-they-work-in-plain-english.
  24. Hothorn T, Hornik K, Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics. 2006;15(3):651-74.
  25. Razeghi Nasrabad HB, Sasanipour M. Effect of COVID-19 Epidemic on Life Expectancy and Years of Life Lost in Iran: A Secondary Data Analysis. Iranian Journal of Medical Sciences. 2022; 47(3):210-8.
  26. Rassouli M, Ashrafizadeh H, Shirinabadi Farahani A, Akbari ME. COVID-19 management in Iran as one of the most affected countries in the world: advantages and weaknesses. Frontiers in public health. 2020;8:510.
  27. Ghafari M, Kadivar A, Katzourakis A. Excess deaths associated with the Iranian COVID-19 epidemic: a province-level analysis. medRxiv. 2020.
  28. Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review. Chaos, Solitons & Fractals. 2020; 139: 110059.
  29. Zhang Z. Decision tree modeling using R. Annals of translational medicine. 2016; 4(15): 275.
  30. Vinod DN, Prabaharan S. Data science and the role of Artificial Intelligence in achieving the fast diagnosis of Covid-19. Chaos, Solitons & Fractals. 2020; 140:110182.
  31. Elshazli RM, Toraih EA, Elgaml A, El-Mowafy M, El-Mesery M, Amin MN, et al. Diagnostic and prognostic value of hematological and immunological markers in COVID-19 infection: A meta-analysis of 6320 patients. PloS one. 2020; 15(8): e0238160.
  32. Shanbehzadeh M, Kazemi-Arpanahi H, Nopour R. Performance evaluation of selected decision tree algorithms for COVID-19 diagnosis using routine clinical data. Medical Journal of the Islamic Republic of Iran. 2021;35:29.
  33. Yeşilkanat CM. Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm. Chaos, Solitons & Fractals. 2020;140:110210.