Siamese Autoencoder Architecture for the Imputation of Data Missing Not at Random

Pereira, Ricardo Cardoso; Abreu, Pedro Henriques; Rodrigues, Pedro Pereira

Siamese Autoencoder Architecture for the Imputation of Data Missing Not at Random

dc.contributor.author	Pereira, Ricardo Cardoso
dc.contributor.author	Abreu, Pedro Henriques
dc.contributor.author	Rodrigues, Pedro Pereira
dc.date.accessioned	2025-04-22T08:12:52Z
dc.date.available	2025-04-22T08:12:52Z
dc.date.issued	2024-06
dc.description.abstract	Missing data is an issue that can negatively impact any task performed with the available data and it is often found in real-world domains such as healthcare. One of the most common strategies to address this issue is to perform imputation, where the missing values are replaced by estimates. Several approaches based on statistics and machine learning techniques have been proposed for this purpose, including deep learning architectures such as generative adversarial networks and autoencoders. In this work, we propose a novel siamese neural network suitable for missing data imputation, which we call Siamese Autoencoder-based Approach for Imputation (SAEI). Besides having a deep autoencoder architecture, SAEI also has a custom loss function and triplet mining strategy that are tailored for the missing data issue. The proposed SAEI approach is compared to seven state-of-the-art imputation methods in an experimental setup that comprises 14 heterogeneous datasets of the healthcare domain injected with Missing Not At Random values at a rate between 10% and 60%. The results show that SAEI significantly outperforms all the remaining imputation methods for all experimented settings, achieving an average improvement of 35%. This work is an extension of the article Siamese Autoencoder-Based Approach for Missing Data Imputation presented at the International Conference on Computational Science 2023. It includes new experiments focused on runtime, generalization capabilities, and the impact of the imputation in classification tasks, where the results show that SAEI is the imputation method that induces the best classification results, improving the F1 scores for 50% of the used datasets.
dc.identifier.citation	Pereira, Ricardo Cardoso & Henriques Abreu, Pedro & Rodrigues, Pedro (2024). Siamese Autoencoder Architecture for the Imputation of Data Missing Not at Random. Journal of Computational Science. 78. 102269. 10.1016/j.jocs.2024.102269
dc.identifier.issn	1877-7503
dc.identifier.uri	https://repositorio.ismt.pt/handle/123456789/1733
dc.language.iso	en
dc.publisher	Journal of Computational Science
dc.relation.ispartofseries	78; 12
dc.title	Siamese Autoencoder Architecture for the Imputation of Data Missing Not at Random
dc.type	Article

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: S1877-main.pdf
Tamanho:: 776.85 KB
Formato:: Adobe Portable Document Format
Descrição:

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 1.71 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

Publicações Científicas C e T