DS Unit 3 Sprint 10: Databases

Welcome to Sprint 10!

What does "data" look like? If you try to picture it, you probably see rows and columns - a spreadsheet or CSV, that can be easily loaded with pandas , cleaned, and analyzed. As a data scientist, this will often be the form you want your data to be in - but it's probably not how your data started.

Most modern data is generated automatically by human interaction with a web-backed application - every action they take, every click they make, all travels over a network and is saved by the server. Though in the rawest of forms this may be a log file, in most cases where it really goes is a database.

So, what is a database? A place for data! If it's relational, it's actually still pretty close to that rows and columns picture, though with some important additional functionality. These databases are commonly accessed using SQL - Structured Query Language - a standard based on relational algebra, and a useful tool known not just by data scientists but by software engineers, MBAs, and more.

If it's so-called "NoSQL", then it's most likely a document-oriented database (or document store) - which, despite the glamor, is essentially a bunch of key-value pairs. What key-value pair object are you already familiar with? Python dicts!

In this sprint, we will learn about both of the above paradigms, and how the separation between them is not as bright a line as you may think.

Modules

This sprint is structured to provide you with a comprehensive understanding of SQL and NoSQL databases:

Module 1

Introduction to SQL

Structured Query Language - the Lingua Franca of Data. Known (to varying degrees) by software engineers, data scientists, DevOps, and MBAs, SQL is the beginning (and sometimes entirety) of many data pipelines.

View Module

Module 2

SQL for Analysis

SQL is simple, but can still be surprisingly powerful - as we learned in the first unit, a lot of analysis can be done with just descriptive statistics, and with the right query SQL can do all that and more.

View Module

Module 3

NoSQL and Document-Oriented Databases

Need to deal with Big Data? You may need tools beyond standard SQL approaches. Enter NoSQL and document-oriented databases that use JSON-like document models.

View Module

Module 4

ACID and Database Scalability Trade-offs

SQL or NoSQL? Why not both! Picking the right database for a situation can be a tricky problem, with many tradeoffs. Learn about ACID guarantees and database scalability.

View Module

Sprint Resources

Code-Alongs

Code-Along 1

Basic SQL Queries

Practice creating basic SQL queries to access and analyze data in a database.

View Code-Alongs

Code-Along 2

Multi-table SQL Queries

Learn advanced SQL techniques for working with multiple tables and complex queries.

View Code-Alongs