Module 1: Python Modules, Packages, and Environments
Module Overview
Python Notebooks are a glorified REPL - read-eval-print loop. What if you want code that should live on and be reused in various circumstances? Enter modules, packages, and environments!
In this module, you'll learn how to create reusable Python code through modules and packages, and how to manage dependencies with virtual environments.
Learning Objectives
- Differentiate between Python Scripts, Modules, Packages, and Libraries
- Create a Python package and install dependencies in a dedicated environment
Objective 01 - Understand and follow Python namespaces and imports
Overview
What happens when you from bar import foo
? And what is run when you type
my_function()
in your repl? More
than you may think!
A core part of Python philosophy is the principle of least astonishment - that the behavior of the code should be what the programmer expects. The motivation is to free the programmer from low-level details, and allow them to focus on the larger flow and design.
You've already been doing this! Your Python notebooks may have relatively short high-level pieces of code, but under the hood they're doing all sorts of cool crazy things. And, hopefully, most of the time the things they do are the things you want.
But, they aren't always - and what's more, if you're always working at the top of the stack, you're limited to the cases considered by the people who wrote the tools you depend on. If you want to have more freedom and power, you must go deeper.
Follow Along
In a text editor, make a file named fibo.py
, and enter the following code:
Fibonacci numbers module
def fib(n): # write Fibonacci series up to n
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a+b
print()
def fib2(n): # return Fibonacci series up to n
result = []
a, b = 0, 1
while a < n:
result.append(a)
a, b = b, a+b
return result
Save this file to a directory, and navigate to that directory with a command line terminal. Then run python, and execute the following:
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
This shows you what is in scope in a basic empty Python session. You shouldn't see much - only things
with double underscores, like __annotations__
and __builtins__
.
Double underscores are a Python convention indicating that, while you can poke at these things, they are core to Python functionality and you may break things in strange ways if you change them. See also monkey patch.
Next, run the following:
>>> from os import getcwd
>>> getcwd()
'/Users/you/some/project/directory'
This should show you the directory where you started Python, referred to as the current working directory. This is important because it is the first place import looks for things.
Now run:
>>> import fibo
>>> fibo.fib(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
>>> fibo.fib2(100)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
>>> fibo.__name__
'fibo'
If you see output as above then you set things up correctly, and now have reusable functions for
calculating Fibonacci numbers in a module that you can access wherever you can put fibo.py
!
Challenge
Make your own module with functions you think may be useful to have in a reusable context. Use your
imagination! Later, you can add them to the bloomdata
package.
Make a directory in your Google Drive and upload fibo.py
and/or the module you made in part
1. Attach your Google Drive to a Colab notebook, and import your modules and use your functions.
Reusable code, in the cloud!
Experiment more with the built-in dir()
function to inspect objects and check out where
things “live” in Python.
Objective 02 - Create a Python package and install dependencies in a dedicated environment
Overview
Modules are great, but if you need more code, you need something bigger - a package!
Packages usually benefit from other packages, so to specify those dependencies in a reproducible and portable fashion, we'll also learn about virtual environments.
You're already a consumer of Python packages and dependencies, and you actually already use containers - they're what makes cloud hosted notebooks such as Google Colab work. Now it's time to learn a bit about how these really work, so you can customize them and make your own.
As a Data Scientist, you may or may not specialize in this sort of development work. But even if you are more of a “type A analysis” person, knowing how these things work at a high level can make you a much more informed consumer and debugger.
Follow Along
In the Guided Project, we start working on our own Python package the right way - by making an environment with pipenv, installing our dependencies, and making some classes. Before watching the recording, make sure to read through any recommended documentation for set up.
Individual computers/operating systems vary, so specific steps won't be listed here - follow the instructions for the program you are installing. If you encounter issues, don't be shy - ask for help from staff or your fellow students.
Challenge
Get a head start - read up on pipenv
, and see if you can figure out making your own
virtual environment. Try to install numpy
in it, and verify that you can activate the
environment and import the specific version of the dependency you installed.
Check out Cookiecutter Data Science for another approach for building a starter data science project in a package. Try to make your own project with it!
Additional Resources
Guided Project
Open guided-project.md in the GitHub repository below to follow along with the guided project. Make sure to complete the Unit 3 setup as it's essential for this module.
Module Assignment
For this assignment, you'll create your own Python package and implement helper utility functions to demonstrate your understanding of modules, packages, and environments.