Getting Started with Snowflake Using Python

A step by step guide

Archit Pandita
Hashmap, an NTT DATA Company

--

Like many new & emerging tools and technology in the big data world, Python is also increasing its implementation areas due to its ease of use. So, Snowflake support in Python is not a surprise. The desire for advances in automation with data platforms and increasingly diverse workloads make for a great combination (Python + Snowflake). For example, automating/building data pipelines and storing data in Snowflake after pre-processing.

In this blog post, I will go through how Python is used for running basic Snowflake commands (CRUD). This is a good starting point for anyone who wants to utilize Python for Snowflake.

This post will have a second part where I will package the whole application in a Docker image and deploy that Docker image on an Azure Kubernetes cluster (trial version). So, please be on the lookout for that!

Prerequisite: Basic skills in Python and Snowflake
Level: Beginner
Code(git repo): hashmapinc / oss / python_snowflake_poc

Snowflake Setup

1. Goto LOGIN: https://www.snowflake.com/

Enter details and apply for the 30-day trial, verify it from Gmail.
In the trial, Snowflake will provide enough credits to get started.

2. Download the Snowflake CLI:

https://docs.snowflake.net/manuals/user-guide/snowsql-install-config.html

Note: Snowflake has very good documentation for all its services. Checkout this user guide for a more detailed understanding (https://docs.snowflake.net/manuals/user-guide.html )

Interacting with Snowflake

Snowflake has 2 ways you can interact with the service:

Web interface: Snowflake provides a web interface that includes the facilities to perform actions like create, modify, and manage accounts and resources within Snowflake.

Limitations of the web interface:
a. “put” command cannot be run via the web interface.
b. file greater than 50MB cannot be loaded, as it is intended to be used for small files.

CLI: an interface that is easy to use if you are a Linux user and fills in these gaps from the web interface.

Setting up the stage for the act (skip if you are familiar with Snowflake’s design flow)

Snowflake’s Architecture

Snowflake’s unique architecture consists of three key layers:
1. Centralized Storage
2. Multi-Cluster Compute and Processing
3. Cloud Services

Snowflake Architecture — image courtesy of the Snowflake Partner Technical Deck

Centralized Storage

When data is loaded into Snowflake, it reorganizes that data into Snowflake’s internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage. Snowflake manages all aspects of how this data is stored.

Query Processing

Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.

Cloud Services

The services within this layer include:

  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control

With different privileges, different databases are managed. Thus, the role of the user is essential to perform certain tasks.

Setup Python and Code Dependencies

Reference: https://docs.snowflake.net/manuals/user-guide/python-connector-install.html

Make sure you have Python 3.x installed with the required modules.

Install the Python package for Snowflake:

pip install — upgrade snowflake-connector-python

Let’s code

The py script is written in a very straightforward way with no fancy classes or methods.

Code (git repo): hashmapinc / oss / python_snowflake_poc · GitLab
(this includes Jupyter notebook)

Note: The repo also consists of the Docker image and file. Don’t worry about that for now as it is not our focus for this exercise.

1. Importing packages

“cred.json” -> JSON file that contains user information to avoid writing sensitive info in code. I load the JSON in the “cred” variable, which is of type dict, as shown below:

# import require module and credential
import snowflake.connector
import json
with open(“cred.json”,”r”) as f:
cred = json.load(f)
create “cred.json” JSON file and write or you can use Json’s dump to create JSON:
{
“userid”:”userid”,
“password”:”xxxxx”,
“account”:”px00000.ap-southeast-2"
}

2. Setup the connection with Snowflake from Python

conn = snowflake.connector.connect(
user=cred["userid"],
password=cred["password"],
account=cred["account"],
session_parameters={
"QUERY_TAG": "EndOfMonthFinance",
}
)
print("success in connecting", conn)

You can also set session parameters by executing the SQL statement ALTER SESSION SET … after connecting:

con.cursor().execute("ALTER SESSION SET QUERY_TAG = 'EndOfMonthFinancials' ")
"""

3. Change the current role to “SYSADMIN” as the default might not be SYSADMIN. This is important to perform actions that will be performed.

conn.cursor().execute("use role sysadmin")

4. Setting up Snowflake for storing and managing data:

Use “CREATE OR REPLACE” -> for new and overwrite
use “CREATE” -> for new (will give error while creating if it exists)
or use IF NOT EXISTS

# Creating a Database, Schema, and Warehouse
conn.cursor().execute("CREATE WAREHOUSE IF NOT EXISTS tiny_warehouse_mg")
conn.cursor().execute("CREATE DATABASE IF NOT EXISTS testdb_mg")
conn.cursor().execute("USE DATABASE testdb_mg")
# above line has to be executed before the below, as while building testschema_mg basic structure of WH, DB should be present
conn.cursor().execute("CREATE SCHEMA IF NOT EXISTS testschema_mg")

5. Set current requirements WH, DB, Schema.

# Using the Database, Schema, and Warehouse
conn.cursor().execute("USE DATABASE testdb_mg")
conn.cursor().execute("USE WAREHOUSE tiny_warehouse_mg")
conn.cursor().execute("USE DATABASE testdb_mg")
conn.cursor().execute("USE SCHEMA testdb_mg.testschema_mg")

6. Creating a new table with 2 columns; one of int type and the other of string type

# Creating Tables and Inserting Data
conn.cursor().execute(
"CREATE OR REPLACE TABLE "
"test_table(col1 integer, col2 string)" )

7. Let’s insert 2 records in the above table

Here we are using string concatenation. Like any other DB connector in Python, we can also use a tuple to prevent SQL injection

# Bad query:
conn.cursor().execute(
"INSERT INTO test_table(col1, col2) VALUES " +
" (123, 'indian Cricket'), " +
" (100, 'Kapil Dev')")
Safe Method:
# SAFE EXAMPLES. DO THIS!
conn.cursor().execute("INSERT INTO test_table(col1, col2) VALUES (%s, %s)", ('123', 'indian Cricket'));

8. Putting Python to use with Snowflake

You have to load data in Snowflake Data storage before you can run the query on that data. To manage it properly, we have a staging area and schema type to help us extract the data in a required strict format. The below code will help us load the CSV files present in the directory name “data.” After staging them, we will copy the file content or dump the data to the table (test_table) we have created.

If you are using Window’s cmd you have to give path like( “\\” ):
(“PUT file://.\\data\\crick* @testdb_mg.testschema_mg.%test_table”)

For S3 or any other cloud storage, similar code is required.

Note: ‘put’ is used from the CLI and make sure have access, path mentioned.

# Putting Data
# @DB.SCHEMA.TABLE , this is as per hierarchy of contents. like DB <- SCHEMA <- Table
# If using Window's cmd : ("PUT file://.\\data\\crick* @testdb_mg.testschema_mg.%test_table")
conn.cursor().execute("PUT file://./data/crick* @testdb_mg.testschema_mg.%test_table")
conn.cursor().execute("""COPY INTO test_table from @testdb_mg.testschema_mg.%test_table/crick*.csv.gz
file_format = (type = csv field_delimiter=',')
pattern = '.*.csv.gz'
on_error= 'skip_file'""")
# For S3
# Copying Data
# con.cursor().execute("""
# COPY INTO testtable FROM s3://<your_s3_bucket>/data/
# CREDENTIALS = (
# aws_key_id='{aws_access_key_id}',
# aws_secret_key='{aws_secret_access_key}')
# FILE_FORMAT=(field_delimiter=',')
# """.format(
# aws_access_key_id=AWS_ACCESS_KEY_ID,
# aws_secret_access_key=AWS_SECRET_ACCESS_KEY))

9. We now have data in the table

# Querying Data
cur = conn.cursor()
try:
cur.execute("SELECT col1, col2 FROM test_table ORDER BY col1")
for (col1, col2) in cur:
print('{0}, {1}'.format(col1, col2))
finally:
cur.close() # avoid unneccessary connection for system stability
print( "connection closed, script end.. Thanks ")
# Use fetchone or fetchmany if the result set is too large to fit into memory.
# results = conn.cursor().execute("SELECT col1, col2 FROM test_table").fetchall()

Summing Things Up

We went through how simple it is to use Python and Snowflake together. This was a basic tutorial and I would encourage you to explore other features and capabilities you can utilize with this combination of tools — for instance, collecting data from a different source and cleansing it before storing it in Snowflake. You may also have alert scripts to check for failures and usage of emails (although Snowflake has alerts).

Ready to Accelerate Your Digital Transformation?

At Hashmap, we work with our clients to build better, together.

If you are considering moving data and analytics products and applications to the cloud or if you would like help and guidance and a few best practices in delivering higher value outcomes in your existing cloud program, then please contact us.

Hashmap, an NTT DATA Company, offers a range of enablement workshops and assessment services, cloud modernization and migration services, and consulting service packages as part of our data and cloud service offerings. We would be glad to work through your specific requirements.

Other Tools and Content You Might Like

Feel free to share on other channels and be sure and keep up with all new content from Hashmap here. To listen in on a casual conversation about all things data engineering and the cloud, check out Hashmap’s podcast Hashmap on Tap as well on Spotify, Apple, Google, and other popular streaming apps.

Archit Pandita is a Python developer and Data Engineer with Hashmap, an NTT DATA Company, providing Data, Cloud, IoT and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.
Have a question? Don’t hesitate to reach out to connect or exchange more information. I’m happy to help:
Archit’s LinkedIn

All the best! Happy Learning!! (`,`)

--

--