Snowflake: Key Concepts & Architecture of a Self-Managed Data Platform

Document from University about Key Concepts & Architecture. The Pdf details Snowflake's architecture, security, governance, and data protection, supporting standard and extended SQL. This Computer science material is useful for students studying data warehousing and cloud services.

See more

52 Pages

Key Concepts & Architecture
Snowflake’s Data Cloud is powered by an advanced data platform provided as a self-
managed service. Snowflake enables data storage, processing, and analytic solutions that
are faster, easier to use, and far more flexible than traditional offerings.
The Snowflake data platform is not built on any existing database technology or “big data”
software platforms such as Hadoop. Instead, Snowflake combines a completely new SQL
query engine with an innovative architecture natively designed for the cloud. To the user,
Snowflake provides all of the functionality of an enterprise analytic database, along with
many additional special features and unique capabilities.
Data Platform as a Self-managed Service
Snowflake is a true self-managed service, meaning:
There is no hardware (virtual or physical) to select, install, configure, or manage.
There is virtually no software to install, configure, or manage.
Ongoing maintenance, management, upgrades, and tuning are handled by
Snowflake.
Snowflake runs completely on cloud infrastructure. All components of Snowflake’s service
(other than optional command line clients, drivers, and connectors), run in public cloud
infrastructures.
Snowflake uses virtual compute instances for its compute needs and a storage service for
persistent storage of data. Snowflake cannot be run on private cloud infrastructures (on-
premises or hosted).
Snowflake is not a packaged software offering that can be installed by a user. Snowflake
manages all aspects of software installation and updates.
Snowflake Architecture
Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database
architectures. Similar to shared-disk architectures, Snowflake uses a central data
repository for persisted data that is accessible from all compute nodes in the platform. But
similar to shared-nothing architectures, Snowflake processes queries using MPP
(massively parallel processing) compute clusters where each node in the cluster stores a
portion of the entire data set locally. This approach offers the data management simplicity
of a shared-disk architecture, but with the performance and scale-out benefits of a shared-
nothing architecture.
Snowflake’s unique architecture consists of three key layers:
Database Storage
Query Processing
Cloud Services
Database Storage
When data is loaded into Snowflake, Snowflake reorganizes that data into its internal
optimized, compressed, columnar format. Snowflake stores this optimized data in cloud
storage.
Snowflake manages all aspects of how this data is stored the organization, file size,
structure, compression, metadata, statistics, and other aspects of data storage are
handled by Snowflake. The data objects stored by Snowflake are not directly visible nor
accessible by customers; they are only accessible through SQL query operations run using
Snowflake.
Query Processing
Query execution is performed in the processing layer. Snowflake processes queries using
“virtual warehouses”. Each virtual warehouse is an MPP compute cluster composed of
multiple compute nodes allocated by Snowflake from a cloud provider.
Each virtual warehouse is an independent compute cluster that does not share compute
resources with other virtual warehouses. As a result, each virtual warehouse has no
impact on the performance of other virtual warehouses.
For more information, see Virtual warehouses.

Unlock the full PDF for free

Sign up to get full access to the document and start transforming it with AI.

Preview

Snowflake Data Cloud Overview

Snowflake's Data Cloud is powered by an advanced data platform provided as a self- managed service. Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings.

The Snowflake data platform is not built on any existing database technology or "big data" software platforms such as Hadoop. Instead, Snowflake combines a completely new SQL query engine with an innovative architecture natively designed for the cloud. To the user, Snowflake provides all of the functionality of an enterprise analytic database, along with many additional special features and unique capabilities.

Data Platform as a Self-managed Service

Snowflake is a true self-managed service, meaning:

  • There is no hardware (virtual or physical) to select, install, configure, or manage.
  • There is virtually no software to install, configure, or manage.
  • Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake.

Snowflake runs completely on cloud infrastructure. All components of Snowflake's service (other than optional command line clients, drivers, and connectors), run in public cloud infrastructures.

Snowflake uses virtual compute instances for its compute needs and a storage service for persistent storage of data. Snowflake cannot be run on private cloud infrastructures (on- premises or hosted).

Snowflake is not a packaged software offering that can be installed by a user. Snowflake manages all aspects of software installation and updates.

Snowflake Architecture

Snowflake's architecture is a hybrid of traditional shared-disk and shared-nothing database architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the platform. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared- nothing architecture.Snowflake's unique architecture consists of three key layers:

  • Database Storage
  • Query Processing
  • Cloud Services

Database Storage

When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage.

Snowflake manages all aspects of how this data is stored - the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.

Query Processing

Query execution is performed in the processing layer. Snowflake processes queries using "virtual warehouses". Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.

Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.

For more information, see Virtual warehouses.Cloud Services

The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.

Services managed in this layer include:

  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control

Connecting to Snowflake

Snowflake supports multiple ways of connecting to the service:

  • A web-based user interface from which all aspects of managing and using Snowflake can be accessed.
  • Command line clients (e.g. SnowSQL) which can also access all aspects of managing and using Snowflake.
  • ODBC and JDBC drivers that can be used by other applications (e.g. Tableau) to connect to Snowflake.
  • Native connectors (e.g. Python, Spark) that can be used to develop applications for connecting to Snowflake.
  • Third-party connectors that can be used to connect applications such as ETL tools (e.g. Informatica) and BI tools (e.g. ThoughtSpot) to Snowflake.

Supported Cloud Platforms

Snowflake is provided as a self-managed service that runs completely on cloud infrastructure. This means that all three layers of Snowflake's architecture (storage, compute, and cloud services) are deployed and managed entirely on a selected cloud platform.

A Snowflake account can be hosted on any of the following cloud platforms:

  • Amazon Web Services (AWS)
  • Google Cloud Platform (GCP)
  • Microsoft Azure (Azure)

On each platform, Snowflake provides one or more regions where the account is provisioned.

If your organization's other cloud services are already hosted on one of these platforms, you can choose to host all your Snowflake accounts on the same platform. However, you can also choose to host your accounts on a different platform.

NoteThe cloud platform you choose for each Snowflake account is completely independent from your other Snowflake accounts. In fact, you can choose to host each Snowflake account on a different platform, although this may have some impact on data transfer billing when loading data.

Pricing

Differences in unit costs for credits and data storage are calculated by region on each cloud platform. For more information about pricing as it pertains to a specific region and platform, see the pricing page (on the Snowflake website).

Data Loading

Snowflake supports loading data from files staged in any of the following locations, regardless of the cloud platform for your Snowflake account:

  • Internal (i.e. Snowflake) stages
  • Amazon S3
  • Google Cloud Storage
  • Microsoft Azure blob storage

Snowflake supports both bulk data loading and continuous data loading (Snowpipe). Likewise, Snowflake supports unloading data from tables into any of the above staging locations.

For more information, see Load data into Snowflake.

Note Some data transfer billing charges may apply when loading data from files staged across different platforms. For more information, see Understanding data transfer cost.

HITRUST CSF Certification

This certification enhances Snowflake's security posture in regulatory compliance and risk management, and is applicable to Snowflake editions that are Business Critical (or higher).

For more information, see Snowflake Security and Trust Center.

Partner Applications

Many partner applications work with Snowflake accounts. For more information, refer to Snowflake ecosystem.

Current Limitations for Accounts on GCP

We strive to provide the same Snowflake experience regardless of the cloud platform you choose for your account; however, some services and features are currently unavailable (or have limited availability) for Snowflake accounts hosted on Google Cloud Platform (GCP).

Google Cloud Private Service Connect

See the limitations section for using Google Cloud Private Service Connect and Snowflake.

Note that following Snowflake system functions for self-service management are notsupported for Google Cloud Private Service Connect for your Snowflake account on GCP:

  • SYSTEM$AUTHORIZE_PRIVATELINK
  • SYSTEM$GET_PRIVATELINK
  • SYSTEM$GET_PRIVATELINK_AUTHORIZED_ENDPOINTS
  • SYSTEM$GET_PRIVATELINK_ENDPOINT_REGISTRATIONS
  • SYSTEM$REGISTER_PRIVATELINK_ENDPOINT
  • SYSTEM$REVOKE_PRIVATELINK
  • SYSTEM$UNREGISTER_PRIVATELINK_ENDPOINT

Network Rules

Network rules are not supported on GCP.

Private Connectivity to Internal Stages

Private connectivity to Snowflake internal stages is currently not supported on GCP. You cannot prevent public IP addresses from accessing an internal stage because there wouldn't be a method of accessing the internal stage.

Snowpark Container Services

Snowpark Container Services is currently not supported on GCP.

Snowflake Open Catalog

Snowflake Open Catalog is currently not available in government regions.

Current Limitations for Accounts on AWS

We strive to provide the same Snowflake experience regardless of the cloud platform you choose for your account; however, some services and features are currently unavailable (or have limited availability) for Snowflake accounts hosted on AWS.

Access to External Network Locations

Access to external network locations from UDF and procedure handler code is currently not supported in the Gov region.

Snowflake Open Catalog AWS

Snowflake Open Catalog is currently not available in government regions.

Current limitations for accounts on Azure

We strive to provide the same Snowflake experience regardless of the cloud platform you choose for your account; however, some services and features are currently unavailable (or have limited availability) for Snowflake accounts hosted on Microsoft Azure.

Azure Private Link

See Azure Private Link Requirements and Limitations.Snowflake Clients

Currently, using the account name URL format for private connectivity to the Snowflake service with SnowSQL, connectors and drivers is not supported. As a workaround, use the account locator format with SnowSQL, connectors, and drivers.

For details, see:

  • Account identifiers
  • Connecting to your accounts

Access to External Network Locations Azure

Access to external network locations from UDF and procedure handler code is currently not supported in the Gov region.

Snowpark Container Services Azure

Snowpark Container Services is currently not supported on Azure.

Snowflake Open Catalog Azure

Snowflake Open Catalog is currently not available in government regions.

Supported Cloud Regions

Regions let your organization choose where your data is geographically stored across your regional, national, and international operations. Regions also determine where your compute resources are provisioned.

Snowflake supports regions across all of the Snowflake-supported cloud platforms, grouped into three global geographic segments (North/South America, Europe/Middle East, and Asia Pacific/China).

Can’t find what you’re looking for?

Explore more topics in the Algor library or create your own materials with AI.