Data quality checks¶
Overview¶
Ensuring high data quality is crucial for any organization that relies on data for decision-making, analysis, and strategic planning. High-quality data can significantly enhance accuracy in reporting, consistency in analytics, and reliability in automated decisions. Conversely, poor data quality can lead to misguided decisions based on inaccurate, incomplete, or outdated information.
AP provides data quality checks to ensure the integrity and accuracy of your data. These checks allow users to define specific criteria that data must meet before it is considered valid for analysis and reporting. This functionality includes:
- Outlier detection: Identify data points that deviate significantly from the norm. Outliers may indicate data entry errors or unusual events that could skew analysis results.
- Relationship: Ensure that relationships between data items make sense. For example, the number of tests should most likely not exceed positive tests.
- Data completeness: Verify that all required data fields are populated and that data spans the required time frames or categories.
- Consistency: Compare data across time, category and sources to ensure data has a consistent format, is free from duplicates and uses the same coding system.
Duality checks in AP is based on SQL queries which define and enforce these rules. By writing a SQL query, users can precisely specify the conditions under which data is considered valid. Conversely, SQL query will reveal conditions which are in violation of the check. When a SQL query identifies data that violates a quality check, AP can trigger alerts or even prevent the integration of flawed data into your reports and analyses.
Manage data quality checks¶
The following section covers how to view, create, update and remove data quality checks.
Create data quality check¶
- Click Create new from the top-right corner.
-
Enter the following information.
Field Description Name The name of the schema (required) Description A description of the schema Code A code of the schema Labels One or many labels on the format key:value
SQL query A SQL query which specifies the conditions under which data is considered valid -
Click Create.
Edit data quality check¶
- Find and click the data quality check to update in the list.
- Click the context menu in the top-right corner.
- Click Edit.
- Update the relevant fields.
- Click Save.
Edit SQL query¶
- Find and click the data quality check to edit in the list.
- Click the context menu in the top-right corner.
- Click Edit the SQL query.
- In the SQL editor, edit the SQL query.
- Click Save.
Remove data quality check¶
- Find and click the data quality check to remove in the list.
- Click the context menu in the top-right corner.
- Click Remove.