A Perspective on the Missing at Random Problem: synthetic generation and benchmark analysis

dc.contributor.authorCabrera-Sánchez, Juan Francisco
dc.contributor.authorPereira, Ricardo Cardoso
dc.contributor.authorAbreu, Pedro Henriques
dc.contributor.authorSilva-Ramírez, Esther Lydia
dc.date.accessioned2025-05-21T16:47:20Z
dc.date.available2025-05-21T16:47:20Z
dc.date.issued2024-11-12
dc.description.abstractProgressively more advanced and complex models are proposed to address problems related to computer vision, forecasting, Internet of Things, Big Data and so on. However, these disciplines require preprocessing steps to obtain meaningful results. One of the most common problems addressed in this stage is the presence of missing values. Understanding the reason why missingness occurs helps to select data imputation methods that are more adequate to complete these missing values. Missing at Random synthetic generation presents challenges such as achieving extreme missingness rates and preserving the consistency of the mechanism. To address these shortcomings, three new methods that generate synthetic missingness under the Missing at Random mechanism are proposed in this work and compared to a baseline model. This comparison considers a benchmark covering 33 data sets and five missingness rates (10%,20%,40%,60%,80%). Seven data imputation methods are compared to evaluate the proposals, ranging from traditional methods to deep learning methods. The results demonstrate that the proposals are aligned with the baseline method in terms of the performance and ranking of data imputation methods. Thus, three new feasible and consistent alternatives for synthetic missingness generation under Missing at Random are presented.
dc.description.sponsorshipThis work was supported in part by the Ministerio de Ciencia, Innovación y Universidades (MCIN)/Agencia Estatal de Investigación (AEI)/10.13039/501100011033 under Grant PID2022-137646OB-C33, and in part by the European Regional Development Fund (ERDF).
dc.identifier.citationJ. F. Cabrera-Sánchez, R. Cardoso Pereira, P. Henriques Abreu and E. L. Silva-Ramírez, "A Perspective on the Missing at Random Problem: synthetic generation and benchmark analysis," in IEEE Access, vol. 12, pp. 162399-162411, 2024, doi: 10.1109/ACCESS.2024.3490396
dc.identifier.issn2169-3536
dc.identifier.urihttps://repositorio.ismt.pt/handle/123456789/1755
dc.language.isoen
dc.publisherIEEE - Institute of Electrical and Electronics Engineers
dc.relation.ispartofseries12; 1
dc.titleA Perspective on the Missing at Random Problem: synthetic generation and benchmark analysis
dc.typeArticle
Ficheiros
Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
A2_RCP.pdf
Tamanho:
4.57 MB
Formato:
Adobe Portable Document Format
Descrição:
Licença
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
license.txt
Tamanho:
1.71 KB
Formato:
Item-specific license agreed upon to submission
Descrição: