Module 2: SQL for Analysis

Module Overview

SQL is simple, but can still be surprisingly powerful - as we learned in the first unit, a lot of analysis can be done with just descriptive statistics, and with the right query SQL can do all that and more.

Learning Objectives

1. Deploy and connect to a managed PostgreSQL database

• Set up a PostgreSQL database instance
• Configure database connection settings
• Establish secure connections to the database
• Manage database access and permissions
• Use database management tools effectively
• Troubleshoot connection issues

2. Use CREATE and INSERT statements to make tables, set their schema, and add rows to them

• Write CREATE TABLE statements with proper data types
• Define table constraints and relationships
• Insert single and multiple rows of data
• Handle data validation and constraints
• Modify existing table structures
• Implement database schema designs

3. Create a data pipeline with SQL

• Design ETL workflows using SQL
• Automate data transformation processes
• Implement data quality checks
• Schedule and monitor pipeline execution
• Handle pipeline errors and exceptions
• Optimize pipeline performance

Guided Project

Important Note

The following Guided Project demonstrates using a cloud PostgreSQL service called ElephantDB, which has since stopped taking new sign ups, to accomplish the project's objectives.

In order to follow along, some adjustments must be made:

Use Aiven.io instead of ElephantDB as demonstrated in Learning Objective 1 in this module.
Connect to and interact with your database from your Python code as demonstrated in Learning Objective 2.

Sprint 10 SQL for Analysis Video

Resources

GitHub Repo Aiven.io Documentation

Guided Project File:

guided-project.md

Module Assignment

Important Note

The following Module Assignment demonstrates using a cloud PostgreSQL service called ElephantDB, which has since stopped taking new sign ups, to accomplish the project's objectives.

In order to follow along, some adjustments must be made:

Use Aiven.io instead of ElephantDB as demonstrated in Learning Objective 1 in this module.
Connect to and interact with your database from your Python code as demonstrated in Learning Objective 2.

Please read the assignment.md file in the GitHub repository for detailed instructions

Assignment File:

assignment.md

Assignment Solution Video

Check for Understanding

Make sure you can answer the following questions about SQL for analysis:

How do you use aggregate functions in SQL to summarize data?
What is the purpose of the GROUP BY clause and how does it work with aggregate functions?
What are the different types of JOIN operations and when would you use each one?
How do indexes improve query performance and what are the trade-offs?
How would you write a query to find the average, minimum, and maximum values in a column?

Additional Resources

Helpful PostgreSQL Resources

DBeaver Documentation PostgreSQL Official Tutorial PostgreSQL with Python Tutorial AWS RDS PostgreSQL Guide Managing PostgreSQL Tables in the Cloud