Module 4: Logistic Regression
Overview
In this module, you will transition from regression to classification with logistic regression. You'll implement train-validate-test splits, understand classification baselines, and learn about scikit-learn pipelines. These skills will enable you to build and evaluate models for binary classification problems.
Learning Objectives
Objective 1: Determine a baseline for classification
Learn how to establish reference points for evaluating your classification models.
- Understanding the concept of classification baselines
- Implementing majority class, stratified, and prior probability baselines
- Calculating baseline accuracy, precision, and recall
- Using baselines to contextualize model performance
Objective 2: Implement a train-validate-test split
Learn how to divide your data into three sets for more robust model evaluation.
- Understanding the purpose of train-validate-test splits
- Implementing multi-stage splits with scikit-learn
- Maintaining class distributions across splits
- Avoiding data leakage between splits
Objective 3: Fit a Logistic Regression classification model
Learn how to implement and fine-tune logistic regression for binary classification.
- Understanding the logistic function and decision boundary
- Implementing logistic regression in scikit-learn
- Tuning regularization strength and solver parameters
- Handling class imbalance in logistic regression
Objective 4: Create a scikit-learn pipeline
Learn to build streamlined workflows for data preprocessing and model training.
- Understanding the benefits of scikit-learn pipelines
- Building pipelines with preprocessing steps and models
- Combining feature transformers with FeatureUnion
- Using pipelines for cross-validation and grid search
Guided Project
Logistic Regression
The notebook for this guided project is JDS_SHR_214_guided_project_notes.ipynb in the GitHub repository.
Module Assignment
Logistic Regression Assignment
In this module assignment, found in the file LS_DS_214_assignment.ipynb in the GitHub repository, you'll apply logistic regression to solve a classification problem:
Tasks:
- Load adult.csv using the wrangle function
- Split data into feature matrix X and target vector y
- Split data into train, validation, and test sets
- Establish accuracy and confusion matrix baselines for your dataset
- Build a transformational pipeline for preprocessing
- Build and train a LogisticRegression model
- Evaluate the model with validation accuracy and f1 score
- Use your model to predict the test set and calculate test accuracy
- Interpret the coefficients from your logistic regression model