Skip to main content
OpenConf small logo

Providing all your submission and review needs
Abstract and paper submission, peer-review, discussion, shepherding, program, proceedings, and much more

Worldwide & Multilingual
OpenConf has powered thousands of events and journals in over 100 countries and more than a dozen languages.

Regex-Driven Approach For A Continuous Data Quality Assessment

Data quality is a critical concern for large, data-driven organizations that depend on accurate information for myriad processes and strategic decision making. In this work, we introduce a systematic, continuous framework for as- sessing data quality as heterogeneous sources are integrated into a single reposi- tory—whether a data warehouse or a data lake. Central to our approach is the use of regular expressions: by formalizing quality-check rules as executable regex patterns, we enable opening the possibility to automatize the overall quality check process. These patterns are embedded directly within SQL queries, which sweep through database tables to flag any records that contravene predefined quality criteria. By integrating regex-based checks into the query layer, our pro- posed approach ensures that data with quality issues is detected and can be ad- dressed promptly, maintaining the integrity and reliability of the database.

Manuele Kirsch Pinheiro
Centre de Recherche en Informatique / Université Paris 1 Panthéon Sorbonne
France

Lydia Khelifa Chibout
Scientific and Technical Center for Building (CSTB)
France