Pakistan Science Abstracts
Article details & metrics
No Detail Found!!
A comparative analysis of parametric and tree-based imputation techniques for missing data in epidemiological research
Author(s):
1. Sara Javadi: Department of Biostatistics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
2. Mohammad Mehdi Saber: Department of Statistics, Higher Education Center of Eghlid, Eghlid, Iran
3. Mehrdad Taghipour: Department of, Faculty of Sciences, University of Qom, Qom, Iran
4. Abdussalam Aljadani: Department of Management, College of Business Administration in Yanbu, Taibah University, Al-Madinah, Al-Munawarah 41411, Kingdom of Saudi Arabia
5. Mahmoud M. Mansour: Department of Management Information Systems, Yanbu, Taibah University, Yanbu 46421, Saudi Arabia; Department of Statistics, Mathematics and Insurance, Faculty of Commerce, Benha University, Egypt
6. Mohamed S. Hamed: Department of Business Administration, Gulf Colleges, KSA; Department of Statistics, Mathematics and Insurance, Faculty of Commerce, Benha University, Egypt
7. Haitham M. Yousof: Department of Statistics, Mathematics and Insurance, Faculty of Commerce, Benha University, Egypt
Abstract:
Missing data presents a common challenge for researchers and data scientists, prompting the use of multiple imputations by chained equations in epidemiologic research. This method is highly favored for its practicality and reliable aptitude to generate unbiased effect estimates and make valid inferences. When employing multiple imputation by chained equations, researchers can choose from various imputation techniques, both parametric and nonparametric. Recent studies indicate that nonparametric tree-based methods may outperform parametric approaches, especially when dealing with interactions or nonlinear effects among predictor variables. Yet, these comparisons can be misleading if the parametric model does not include all effects present in the final analysis model, including interactions. Based on simulation results, it has been shown that integrating interactions into the parametric imputation model enhances its effectiveness in handling missing binary outcomes. While parametric imputation generally results in lower bias and slightly higher coverage probability for interaction effects, it tends to yield wider confidence intervals compared to tree-based methods, such as classification and regression trees. Furthermore, parametric imputation requires careful specification of the imputation model. Epidemiologists must be diligent in defining their imputation models within multiple imputations by chained equations. This study contributes to the field by offering a balanced comparison between parametric and tree-based imputation methods for data sets featuring binary outcomes.
Page(s): 99-112
DOI: DOI not available
Published: Journal: Journal of Applied Probability and Statistics, Volume: 19, Issue: 3, Year: 2024
Keywords:
missing data , epidemiological research , treebased imputation techniques
References:
References are not available for this document.
Citations
Citations are not available for this document.
0

Citations

0

Downloads

8

Views