Regex-Driven Approach For A Continuous Data Quality Assessment
Data quality is a critical concern for large, data-driven organizations that depend on accurate information for myriad processes and strategic decision making. In this work, we introduce a systematic, continuous framework for as- sessing data quality as heterogeneous sources are integrated into a single reposi- tory—whether a data warehouse or a data lake. Central to our approach is the use of regular expressions: by formalizing quality-check rules as executable regex patterns, we enable opening the possibility to automatize the overall quality check process. These patterns are embedded directly within SQL queries, which sweep through database tables to flag any records that contravene predefined quality criteria. By integrating regex-based checks into the query layer, our pro- posed approach ensures that data with quality issues is detected and can be ad- dressed promptly, maintaining the integrity and reliability of the database.
