Module 3: Ridge Regression

Overview

In this module, you will build on your regression knowledge with ridge regression. You'll learn about one-hot encoding, feature selection, and how regularization can improve model performance. These techniques will help you handle categorical variables and build more effective models with many features.

Learning Objectives

Objective 1: Encode categorical variables via one-hot encoding

Learn how to transform categorical variables into a format suitable for machine learning algorithms.

  • Understanding the need for encoding categorical variables
  • Implementing one-hot encoding with scikit-learn
  • Handling categorical variables with many categories
  • Integrating encoding into machine learning pipelines

Objective 2: Perform feature selection to select relevant variables for the model

Learn techniques to identify and select the most important features for your model.

  • Understanding the importance of feature selection
  • Implementing filter methods for feature selection
  • Using wrapper methods to evaluate feature subsets
  • Interpreting feature importance from models

Objective 3: Fit a Ridge Regression model using scikit-learn

Learn how to implement ridge regression to handle multicollinearity and prevent overfitting.

  • Understanding regularization in linear models
  • Implementing ridge regression with scikit-learn
  • Tuning the alpha parameter
  • Comparing ridge regression to ordinary least squares

Guided Project

Ridge Regression

The notebook for this guided project is JDS_SHR_213_guided_project_notes.ipynb in the GitHub repository.

Module Assignment

Ridge Regression Assignment

In this module assignment, found in the file LS_DS_213_assignment.ipynb in the GitHub repository, you'll apply ridge regression to real-world data problems:

Tasks:

  1. Import csv file using wrangle function
  2. Conduct exploratory data analysis (EDA), and modify wrangle function to engineer two subset your dataset to one-family dwellings whose price is between $100,000 and $2,000,000
  3. Split data into feature matrix X and target vector y
  4. Split feature matrix X and target vector y into training and test sets
  5. Establish the baseline mean absolute error for your dataset
  6. Build and train a OneHotEncoder, and transform X_train and X_test
  7. Build and train a LinearRegression model
  8. Build and train a Ridge model
  9. Calculate the training and test mean absolute error for your LinearRegression model
  10. Calculate the training and test mean absolute error for your Ridge model
  11. Create a horizontal bar chart showing the 10 most influencial features for your Ridge model