Toll Free:

1800 889 7020

Snowflake Data Metric Functions : Everything You Need to Know About

Snowflake Data Metric Functions (DMFs) Overview

Data quality emphasizes understanding the condition and reliability of enterprise-grade data, including its freshness and accuracy by comparing actual values against null or blank fields in a column. This ensures informed, data-driven decisions. This is where Data Metric Functions (DMF) come in handy as they can assess data quality using Data Metric Functions (DMFs), and Snowflake offers built-in system DMFs within the SNOWFLAKE.CORE schema to measure key metrics without requiring custom definitions.

Whether we utilize system DMFs, custom DMFs, or a combination of both, Snowflake logs the scheduling results in a dedicated event table for data metric functions once a DMF is assigned to a table or view. We can define the execution frequency based on the requirements. For instance, a DMF can be scheduled to run on a specific table thrice daily. As part of Snowflake consulting practices, integrating DMFs becomes critical to ensure data remains trustworthy for analytics, reporting, and governance.

Objects supported by DMFs:

Currently, below are the objects that are supported by the DMFs, meaning the same can be applied to them.

  • Standard tables (including temporary and transient).
  • Event tables.
  • External tables.
  • Apache Iceberg tables.
  • Standard views.
  • Materialized views.

Some key points that have to be noted over here are for using the DMFs, the minimum edition required is the Enterprise edition. Also, the DMFs cannot be set on hybrid tables OR streams.

DMF category:

DMFs can be broadly classified into 2 categories namely:

1 System DMFs.

2 Custom DMFs.

1 System DMFs:

System DMFs are managed by Snowflake and cannot be modified, including their names or functionality. Each system DMF evaluates a specific data quality attribute. To achieve a more comprehensive assessment, multiple system DMFs can be assigned to a table or view, helping meet governance and compliance requirements. These are frequently recommended in Snowflake development services to ensure data reliability from the ground up. Below is the list of the System DMFs and their categories.

Serial No.System Defined-DMFsCategory
1BLANK_COUNTAccuracy
2BLANK_PERCENTAccuracy
3NULL_COUNTAccuracy
4NULL_PERCENTAccuracy
5FRESHNESSFreshness
6DATA_METRIC_SCHEDULE_TIMEFreshness
7AVGStatistics
8MINStatistics
9MAXStatistics
10STDDEVStatistics
11DUPLICATE_COUNTUniqueness
12UNIQUE_COUNTUniqueness
13ROW_COUNTVolume

2 Custom DMFs:

If no existing system DMF meets your data quality requirements, you can create a custom DMF using the CREATE DATA METRIC FUNCTION command.

Below is how it can be done:
Step 1: Create the custom DMF by use of the below command(where invalid_email_count is the DMF):

CREATE DATA METRIC FUNCTION IF NOT EXISTS
  invalid_email_count (ARG_T table(ARG_C1 STRING))
  RETURNS NUMBER AS
  'SELECT COUNT_IF(FALSE = (
ARG_C1 REGEXP ''^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$''))
    FROM ARG_T';

Step 2: Associate the DMF to the table.

ALTER TABLE customers ADD DATA METRIC FUNCTION
  invalid_email_count ON (email);

DMF access controls:

Below are the steps that must ensure we enable the access controls for an end-user to be using the DMF functions:

USE ROLE SECURITYADMIN;
CREATE ROLE RL_DEV_ENGINEER;
GRANT ROLE RL_DEV_ENGINEER TO USER <user_name>;

-- This is for creating standard objects.
GRANT CREATE DATABASE ON ACCOUNT TO ROLE RL_DEV_ENGINEER;

-- This is used to access system defined data metric functions.
GRANT EXECUTE DATA METRIC FUNCTION ON ACCOUNT TO ROLE RL_DEV_ENGINEER;

-- This is used to monitor the data metric functions.
GRANT APPLICATION ROLE SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER TO ROLE RL_DEV_ENGINEER;
GRANT DATABASE ROLE SNOWFLAKE.USAGE_VIEWER TO ROLE RL_DEV_ENGINEER;

-- This is used to access system defined data metric functions.
GRANT DATABASE ROLE SNOWFLAKE.DATA_METRIC_USER TO ROLE RL_DEV_ENGINEER;

Over here 3 things need to be noted which are:

  • SNOWFLAKE.DATA_METRIC_USER, this is a database role assigned to a role.
  • EXECUTE DATA METRIC FUNCTION, is to ensure the privileges are there to execute the DMFs.
  • SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER, to ensure privileges are there to monitor the output of the DMFs.

Use Cases for Data Metric Functions in Snowflake:

Application of DMFs on a table and schedule it.
In this section, we will apply the system-defined DMFs on a table and then also set the scheduling of it.

CREATE OR REPLACE TABLE DQ_DATA.TRANSACTIONS (
  transaction_id INTEGER, 
  transaction_date DATE, 
  client_id VARCHAR, 
  payment_type VARCHAR
);

INSERT INTO TRANSACTIONS VALUES
(82610, '2024-04-08', '4276', 1),
(260992, '2024-04-26', '644A', 1),
(819485, '2024-04-29', '2344', 2),
(538988, '2044-04-09', '2610', 1),
(538988, '2024-04-26', '6317', 1),
(781004, '2024-04-30', '3306', 0),
(303412, '2024-04-27', '2664', 2),
(805451, '2024-05-02', '2884', 1),
(729482, '2024-04-25', NULL, 1),
(254959, '2024-04-19', '6343', 1);

Now, let us say, we want to check the NULL count against the column client_id for the above set of records, then below is how we do that:
SELECT snowflake.core.null_count (SELECT client_id FROM TRANSACTIONS);

SELECT client_id

The output would be 1, as mentioned above since there is one record for client_id as NULL. Now, let us schedule some DMF checks on the above table.

In the very first step, the scheduling has to be defined and then we associate the DMFs to the tables. Below is how we can do the scheduling:

ALTER TABLE TRANSACTIONS SET DATA_METRIC_SCHEDULE = '5 MINUTE';

Post this we can check if the scheduling has been done correctly or not as per the below command & screenshot also is attached.

SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE TRANSACTIONS;
DATA_METRIC_SCHEDULE

Now, we define the DMFs as given below:

ALTER TABLE TRANSACTIONS
  add data metric function SNOWFLAKE.CORE.DUPLICATE_COUNT 
  on (transaction_id);
  
ALTER TABLE TRANSACTIONS
  add data metric function SNOWFLAKE.CORE.NULL_COUNT 
  on (client_id);

There are now 2 checks defined on the same table which are duplicate count & null count. Each of these is defined on separate columns of the same table.

Post 5 mins, we can check what is the value that we get, below is the command through which we can continuously check the output post the DMF has run within the system.

select measurement_time, table_name, metric_name, argument_names, value
from SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS 
order by measurement_time desc;
select measurement_time

As we see from the above screenshot, the value against the METRIC_NAME is now properly getting returned.

Governance & monitoring of the Data Metric Functions in Snowflake:

Managing data metric functions, including adding, removing, or modifying schedules, can become complex over time. To simplify tracking, Snowflake offers tools to view and monitor the defined data quality functions for each table. Below is how these can be monitored:

1. To see the metric schedule on a table, we execute the below command:

SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE TRANSACTIONS;
DATA_METRIC_SCHEDULE2

2. To view the data metric functions linked to a table, use the DATA_METRIC_FUNCTION_REFERENCES table function, passing the object name and object domain as parameters, as shown below:

SELECT * FROM TABLE(
  INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES(
    ref_entity_name => 'DEMO_DB.DEMO_SCHEMA.TRANSACTIONS', 
    ref_entity_domain => 'TABLE'
DATA_METRIC_FUNCTION_REFERENCES

3. To see the measurement time that represents the exact time when the data metric functions ran and gave the output. Below is the supporting query for it:

select measurement_time, table_name, metric_name, argument_names, value
from SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS 
order by measurement_time desc;
represents the exact time

Key benefits of Data Metric Functions in Snowflake:

Managing data metric functions, including adding, removing, or modifying schedules, can become complex over time. To simplify tracking, Snowflake offers tools to view and monitor.

1 Built-in feature:

DMFs are completely native to Snowflake, hence we we can do most of the activities by staying inside Snowflake without having to rely on other tools that provide the data quality support.

2 Custom metric support:

We can create your DMF using SQL expressions to measure custom data quality rules specific to the business needs.

3 Continuous and scheduled validation:

DMFs can be scheduled to run periodically, allowing ongoing monitoring of data quality over time without manual intervention. This we have seen in the above example also.

4 Centralized governance:

All DMFs and their results are centrally stored and accessible, enabling governance teams to track, audit, and report on data health.

5 Data-driven decision making:

By surfacing data quality metrics directly in Snowflake, teams can trust the data they’re using in analytics, AI/ML models, and business decisions.

Summary:

To summarize DMFs play a crucial role in maintaining data governance and reliability by allowing teams to track changes in data quality over time. Organizations can also define custom DMFs to handle more specific or complex quality checks, ensuring flexibility in meeting business or compliance needs. The results of DMFs are extremely easy to interpret, making it easy to audit, report, or trigger downstream actions based on the data quality status. With DMFs, Snowflake empowers users to embed automated, scalable data quality monitoring into their data pipelines effortlessly.

Read more:

felipe-hicks

Felipe Hicks

Experienced IT professional with a strong background in software development and a demonstrated history of working in the outsourcing and offshoring industry. Skilled in developing scalable applications, implementing modern architectural patterns, and leveraging cloud technologies. Proficient in cross-platform development and passionate about driving innovation and delivering tailored solutions. Committed to staying updated with emerging technologies and collaborating effectively to achieve project goals and exceed client expectations

Scroll to Top