Module 1: Python Modules, Packages, and Environments

Module Overview

Python Notebooks are a glorified REPL - read-eval-print loop. What if you want code that should live on and be reused in various circumstances? Enter modules, packages, and environments!

In this module, you'll learn how to create reusable Python code through modules and packages, and how to manage dependencies with virtual environments.

Learning Objectives

Objective 01 - Understand and follow Python namespaces and imports

Overview

What happens when you from bar import foo? And what is run when you type my_function() in your repl? More than you may think!

A core part of Python philosophy is the principle of least astonishment - that the behavior of the code should be what the programmer expects. The motivation is to free the programmer from low-level details, and allow them to focus on the larger flow and design.

You've already been doing this! Your Python notebooks may have relatively short high-level pieces of code, but under the hood they're doing all sorts of cool crazy things. And, hopefully, most of the time the things they do are the things you want.

But, they aren't always - and what's more, if you're always working at the top of the stack, you're limited to the cases considered by the people who wrote the tools you depend on. If you want to have more freedom and power, you must go deeper.

Follow Along

In a text editor, make a file named fibo.py, and enter the following code:

Fibonacci numbers module
def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

def fib2(n):   # return Fibonacci series up to n
    result = []
    a, b = 0, 1
    while a < n:
        result.append(a)
        a, b = b, a+b
    return result

Save this file to a directory, and navigate to that directory with a command line terminal. Then run python, and execute the following:

>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']

This shows you what is in scope in a basic empty Python session. You shouldn't see much - only things with double underscores, like __annotations__ and __builtins__.

Double underscores are a Python convention indicating that, while you can poke at these things, they are core to Python functionality and you may break things in strange ways if you change them. See also monkey patch.

Next, run the following:

>>> from os import getcwd
>>> getcwd()
'/Users/you/some/project/directory'

This should show you the directory where you started Python, referred to as the current working directory. This is important because it is the first place import looks for things.

Now run:

>>> import fibo

>>> fibo.fib(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
>>> fibo.fib2(100)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
>>> fibo.__name__
'fibo'

If you see output as above then you set things up correctly, and now have reusable functions for calculating Fibonacci numbers in a module that you can access wherever you can put fibo.py!

Challenge

Make your own module with functions you think may be useful to have in a reusable context. Use your imagination! Later, you can add them to the bloomdata package.

Make a directory in your Google Drive and upload fibo.py and/or the module you made in part 1. Attach your Google Drive to a Colab notebook, and import your modules and use your functions. Reusable code, in the cloud!

Experiment more with the built-in dir() function to inspect objects and check out where things “live” in Python.

Objective 02 - Create a Python package and install dependencies in a dedicated environment

Overview

Modules are great, but if you need more code, you need something bigger - a package!

Packages usually benefit from other packages, so to specify those dependencies in a reproducible and portable fashion, we'll also learn about virtual environments.

You're already a consumer of Python packages and dependencies, and you actually already use containers - they're what makes cloud hosted notebooks such as Google Colab work. Now it's time to learn a bit about how these really work, so you can customize them and make your own.

As a Data Scientist, you may or may not specialize in this sort of development work. But even if you are more of a “type A analysis” person, knowing how these things work at a high level can make you a much more informed consumer and debugger.

Follow Along

In the Guided Project, we start working on our own Python package the right way - by making an environment with pipenv, installing our dependencies, and making some classes. Before watching the recording, make sure to read through any recommended documentation for set up.

Individual computers/operating systems vary, so specific steps won't be listed here - follow the instructions for the program you are installing. If you encounter issues, don't be shy - ask for help from staff or your fellow students.

Challenge

Get a head start - read up on pipenv, and see if you can figure out making your own virtual environment. Try to install numpy in it, and verify that you can activate the environment and import the specific version of the dependency you installed.

Check out Cookiecutter Data Science for another approach for building a starter data science project in a package. Try to make your own project with it!

Additional Resources

Guided Project

Open guided-project.md in the GitHub repository below to follow along with the guided project. Make sure to complete the Unit 3 setup as it's essential for this module.

Module Assignment

For this assignment, you'll create your own Python package and implement helper utility functions to demonstrate your understanding of modules, packages, and environments.

Solution Video

Additional Resources

Documentation & Tutorials