Snowflake: Modern Data Warehouse

banner
snowflake data warehouse

You may have heard that Data Engineering is moving towards cloud data warehouses which are cheaper from evaluation and storage perspectives. In this article, we will explore famous modern data warehouse called Snowflake that not only provides cheap resources but also comes with unbelievably advanced features to make the analytics world more powerful and efficient.

Snowflake is a SAAS solution which means Software as a Service that provides storage, computing, and analytics under one umbrella. SAAS also means that you don’t have to install any special software, maintain or upgrade any server or take backups for disaster recovery. All these come as an integrated part of Snowflake.

Where does Snowflake run?

Underneath, Snowflake is provided on general cloud networks such as AWS, Google Cloud, and Microsoft Azure. That being said, it means it can’t be managed on-premises or privately managed clouds. While creating the snowflake account, you can choose among the cloud-supplier for snowflake hosting as shown in the lower graphic.

Snowflake Modern Data

Snowflake Architecture

Snowflake Modern Data

Database Storage Layer

Snowflake uses an underlying cloud provider for a depot. For example, if you choose AWS, it will use AWS S3 storing data while storing the data, and it will also optimize the data in a more developed & compressed setup. To create it faster for analytics inquiries, it will also use a circular format for keeping the data. All of the central data and optimization process is abstracted from the end user. As a customer, you will not be able to direct access the data, you will always have to manage the data through SQL inquiry in the Snowflake Editor.

Query Processing Layer

Processing in snowflake is done through SQL. You can build virtual warehouses according to your use case. For example, if your queries are easy, you can go for a little virtual warehouse, if your queries are complex and long-running, then you can go for a medium or large warehouse. You can choose your warehouse size according to your analytics needs and this approach saves a lot of money. Each Virtual warehouse has its computing power and they do not allocate resources.

Cloud Services

Snowflake does not just provide storage and computing, but it also provides tons of extra services such as optimization, metadata management, management control, and much more. In simple words, it could be a one-stop solution for all your analytics use cases.

Snowflake Unique Features

Time Travel

In Snowflake, you can recover deleted or changed data within a defined period using time travel. Snowflake standard and enterprise edition provide 1 day of data recovery by default. However, you can also set it to up to 90 days in the enterprise edition.

Create table customer (Id number, joining_date date) data_retention_time_in_days=90;

Let’s say you modified something in above table but later you released that changes were not appropriate and you want to go back in time. For example, you want to see what data was 5 minutes back, below query could be used

Select * from customer at (offset => -60*5);
Snowflake Modern Data

Fail-Safe

Snowflake Modern Data

Even after the time-travel period is finished, data does not get deleted immediately. It moves to the Fail-Safe zone. However, no user operations are allowed in the Fail-Safe zone and only snowflake can recover the data from that zone on request. In the Fail-Safe zone, data is saved for 7 more days by Snowflake. Moreover, data recovery from Fail-Safe is not immediate, it may take days to recover the data.

Zero-Copy Cloning

Zero-copy cloning is another unique feature of snowflake using which you can create clones of databases, schema, and tables without downtime and in no extra time. Cloning in Snowflake means cloning the metadata. Internally, all data is saved in micro-partitions. Unless and until you will not change the content of cloned database/table, they keep on referring to the same underlying micro-partitions. Even when data is updated, only the relevant micro-partitions are updated. You can also create clones from time-travel

Below query clone customer table as it existed exactly at the date and time of the specified timestamp:

create table customers_clone clone customer at (timestamp => to_timestamp_tz('05/11/2022 01:02:03', 'mm/dd/yyyy hh24:mi:ss'));
Snowflake Modern Data

Never Forget!!

Snowflake is a modern data warehouse with so many advanced features that are required for today’s data engineers but there are a few things that you should never forget to use this data warehouse more efficiently

Time-Travel & Fail-Safe

Data that is saved in time-travel and fail-safe zone both will take storage and will add to your monthly expenses, so if some data is not critical and is recoverable, set its retention time to 0 and turn off the fail-safe for it.

Enable Auto-Suspend

Whenever you create a virtual warehouse in snowflake, never forget to turn on the auto-suspend feature since it can save a lot of money when you are not running any query on Snowflake development.

Enable Auto-Resume

Enabling auto-resume will make sure that whenever use is querying against any warehouse, they get started if they are auto-suspended.

Set Resource Monitors

Using Resource monitors, you can send notifications and alerts in case your usage exceeds a certain threshold. Always make sure to set appropriate resource monitors to guard your expenses. The below figure depicts how we can set resource monitor on a specific warehouse.

Snowflake Modern Data

Set Workloads Timeout

Snowflake maintains warm caches for frequent access by other users. But these caches take storage and are not useful for all kinds of workloads. For example, if there is some data loading task, you can set a timeout for the warm cache as immediately as those caches are not useful. Similarly, if there are some select queries, you can maintain the warm cache for a longer period since those could be reused by other users.

Related article

Nowadays, every client wants to know the best practices for creating a data warehouse or a data mart on Snowflake; in addition to know how Snowflake is differentiated from conventional relational databases. When using a relational database model, all users will consistently be presented with the same data.

Are you having trouble with safe data sharing, exorbitant expenses, and never-ending technological updates? Snowflake Consulting can assist you in implementing a Snowflake data warehouse in your organization. Share data across the company with ease, improve performance, and scale for AI/ML difficulties, among many other things.

The global market for data warehousing will increase at a rate of 8 percent between 2024 and 2025, reaching a complete market value of over $30 billion by 2025. Because of this, data rather has evolved into a conventional method of saving data, and at present, it is considered the industry standard. To effectively handle its ever-increasing data capacities, a growing number of companies are turning to data warehouse as the answer to their problem.

DMCA Logo do not copy