Module 3: Ridge Regression
Overview
In this module, you will build on your regression knowledge with ridge regression. You'll learn about one-hot encoding, feature selection, and how regularization can improve model performance. These techniques will help you handle categorical variables and build more effective models with many features.
Learning Objectives
Objective 1: Encode categorical variables via one-hot encoding
Learn how to transform categorical variables into a format suitable for machine learning algorithms.
- Understanding the need for encoding categorical variables
- Implementing one-hot encoding with scikit-learn
- Handling categorical variables with many categories
- Integrating encoding into machine learning pipelines
Objective 2: Perform feature selection to select relevant variables for the model
Learn techniques to identify and select the most important features for your model.
- Understanding the importance of feature selection
- Implementing filter methods for feature selection
- Using wrapper methods to evaluate feature subsets
- Interpreting feature importance from models
Objective 3: Fit a Ridge Regression model using scikit-learn
Learn how to implement ridge regression to handle multicollinearity and prevent overfitting.
- Understanding regularization in linear models
- Implementing ridge regression with scikit-learn
- Tuning the alpha parameter
- Comparing ridge regression to ordinary least squares
Guided Project
Ridge Regression
The notebook for this guided project is JDS_SHR_213_guided_project_notes.ipynb in the GitHub repository.
Module Assignment
Ridge Regression Assignment
In this module assignment, found in the file LS_DS_213_assignment.ipynb in the GitHub repository, you'll apply ridge regression to real-world data problems:
Tasks:
- Import csv file using wrangle function
- Conduct exploratory data analysis (EDA), and modify wrangle function to engineer two subset your dataset to one-family dwellings whose price is between $100,000 and $2,000,000
- Split data into feature matrix X and target vector y
- Split feature matrix X and target vector y into training and test sets
- Establish the baseline mean absolute error for your dataset
- Build and train a OneHotEncoder, and transform X_train and X_test
- Build and train a LinearRegression model
- Build and train a Ridge model
- Calculate the training and test mean absolute error for your LinearRegression model
- Calculate the training and test mean absolute error for your Ridge model
- Create a horizontal bar chart showing the 10 most influencial features for your Ridge model