ACM Journal of Data and Information Quality Special Issue on Deep Learning for Data Quality
Guest Editors Paolo Papotti, EURECOM (France) Donatello Santoro, Università degli Studi della Basilicata (Italy) Saravanan Thirumuruganathan, QCRI (Qatar)
Deep learning (DL) has been recently used successfully for monitoring and improving data quality (DQ). Examples include data integration tasks such as entity resolution and schema matching, data cleaning tasks such as error detection and repair, and data curation in general. The data curation community has successfully leveraged deep learning techniques spanning from word embeddings to transformers to achieve state-of-the-art performance on well established data quality benchmarks. Nevertheless, there is still an open debate on which technical solution performs best for relational data and under which setting.
Despite a promising start, deep learning for data quality has a long way to way to go in achieving the human level performance that it has achieved in domains such as computer vision, natural language processing, and speech recognition. While there has been some substantial improvements in specific tasks such as entity resolution and data repair/imputation, many of the other data quality tasks (such as data discovery, data profiling, data integration, record fusion) are yet to fully benefit from the DL revolution. Also, it is not clear how to push DL techniques to get the same level of adaptation achieved by more traditional logic-based methods. For example, interpretability of the models is a key stumbling block. How can one develop DQ explanations that are consumed by non-experts? Should the explanation be generated individually for each error? Or can it be summarized so that the user gets a high level overview? Finally, DL data quality tools need novel explanation algorithms which are not a priority for DL researchers as the architecture is quite specific.
This special issue focuses on deep learning used for assessing and improving the quality of data. Thus, the issue is addressed to those members from the data science community proposing novel methods, architectures and algorithms capable of integrating, cleaning and profiling relational data sources with supervised and unsupervised approaches.
Click here for the full Call for Papers and submission instructions.
Important Dates Submission accepted starting: January 1, 2021 Submissions deadline: March 1, 2021 First-round review decisions: May 15, 2021 Deadline for revision submissions: July 15, 2021 Notification of final decisions: September 15, 2021 Tentative publication: January 2022
For questions and further information, please contact: papotti@eurecom.fr.
Sign up for JDIQ TOC alerts.
https://jdiq.acm.org
|