Imputation of Data Missing Not at Random: artificial generationand benchmark analysis

dc.contributor.authorPereira, Ricardo Cardoso
dc.contributor.authorAbreu, Pedro Henriques
dc.contributor.authorRodrigues, Pedro Pereira
dc.contributor.authorFigueiredo, Mário A. T.
dc.date.accessioned2025-05-22T08:21:38Z
dc.date.available2025-05-22T08:21:38Z
dc.date.issued2024-09-01
dc.description.abstractExperimental assessment of different missing data imputation methods often compute error rates between the original values and the estimated ones. This experimental setup relies on complete datasets that are injected with missing values. The injection process is straightforward for the Missing Completely At Random and Missing At Random mechanisms; however, the Missing Not At Random mechanism poses a major challenge, since the available artificial generation strategies are limited. Furthermore, the studies focused on this latter mechanism tend to disregard a comprehensive baseline of state-of-the-art imputation methods. In this work, both challenges are addressed: four new Missing Not At Random generation strategies are introduced and a benchmark study is conducted to compare six imputation methods in an experimental setup that covers 10 datasets and five missingness levels (10% to 80%). The overall findings are that, for most missing rates and datasets, the best imputation method to deal with Missing Not At Random values is the Multiple Imputation by Chained Equations, whereas for higher missingness rates autoencoders show promising results.
dc.identifier.citationRicardo Cardoso Pereira, Pedro Henriques Abreu, Pedro Pereira Rodrigues, Mário A.T. Figueiredo, Imputation of Data Missing Not at Random: artificial generation and benchmark analysis, Expert Systems with Applications, Volume 249, Part B, 2024, 123654, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2024.123654
dc.identifier.issn1873-6793
dc.identifier.urihttps://repositorio.ismt.pt/handle/123456789/1756
dc.language.isoen
dc.publisherElsevier
dc.relation.ispartofseries249; 55
dc.titleImputation of Data Missing Not at Random: artificial generationand benchmark analysis
dc.typeArticle
Ficheiros
Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
A1_RCP.pdf
Tamanho:
2.13 MB
Formato:
Adobe Portable Document Format
Descrição:
Licença
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
license.txt
Tamanho:
1.71 KB
Formato:
Item-specific license agreed upon to submission
Descrição: