Module 4: Logistic Regression

Overview

In this module, you will transition from regression to classification with logistic regression. You'll implement train-validate-test splits, understand classification baselines, and learn about scikit-learn pipelines. These skills will enable you to build and evaluate models for binary classification problems.

Learning Objectives

Objective 1: Determine a baseline for classification

Learn how to establish reference points for evaluating your classification models.

Understanding the concept of classification baselines
Implementing majority class, stratified, and prior probability baselines
Calculating baseline accuracy, precision, and recall
Using baselines to contextualize model performance

Objective 2: Implement a train-validate-test split

Learn how to divide your data into three sets for more robust model evaluation.

Understanding the purpose of train-validate-test splits
Implementing multi-stage splits with scikit-learn
Maintaining class distributions across splits
Avoiding data leakage between splits

Objective 3: Fit a Logistic Regression classification model

Learn how to implement and fine-tune logistic regression for binary classification.

Understanding the logistic function and decision boundary
Implementing logistic regression in scikit-learn
Tuning regularization strength and solver parameters
Handling class imbalance in logistic regression

Objective 4: Create a scikit-learn pipeline

Learn to build streamlined workflows for data preprocessing and model training.

Understanding the benefits of scikit-learn pipelines
Building pipelines with preprocessing steps and models
Combining feature transformers with FeatureUnion
Using pipelines for cross-validation and grid search

Guided Project

Logistic Regression

GitHub: Logistic Regression Slides

The notebook for this guided project is JDS_SHR_214_guided_project_notes.ipynb in the GitHub repository.

Module Assignment

Logistic Regression Assignment

In this module assignment, found in the file LS_DS_214_assignment.ipynb in the GitHub repository, you'll apply logistic regression to solve a classification problem:

Tasks:

Load adult.csv using the wrangle function
Split data into feature matrix X and target vector y
Split data into train, validation, and test sets
Establish accuracy and confusion matrix baselines for your dataset
Build a transformational pipeline for preprocessing
Build and train a LogisticRegression model
Evaluate the model with validation accuracy and f1 score
Use your model to predict the test set and calculate test accuracy
Interpret the coefficients from your logistic regression model