Module 2: Linear Regression 2
Overview
In this module, you will dive deeper into linear regression. You'll learn about train-test splits, multiple regression, ordinary least squares, and the bias-variance tradeoff. These concepts will help you build more robust regression models and better understand model performance.
Learning Objectives
Objective 1: Understand Overfitting-Underfitting and Bias-Variance tradeoff
Learn about the fundamental concepts of model complexity and generalization.
- Recognizing signs of overfitting and underfitting
- Understanding bias and variance components of error
- Balancing model complexity against generalization
- Implementing strategies to mitigate overfitting
Objective 2: Implement a train-test split
Learn how to properly split your data into training and testing sets to evaluate model performance.
- Understanding the importance of train-test splits
- Using scikit-learn's train_test_split function
- Setting a random state for reproducibility
- Choosing appropriate train-test proportions
Objective 3: Fit and evaluate a Multiple Linear Regression model
Extend your regression skills to models with multiple features.
- Understanding the differences between simple and multiple regression
- Selecting and engineering features for multiple regression
- Implementing multiple regression in scikit-learn
- Evaluating model performance on training and test sets
Guided Project
Linear Regression II
The notebook for this guided project is JDS_SHR_212_guided_project_notes.ipynb in the GitHub repository.
Module Assignment
Linear Regression 2 Assignment
In this module assignment, found in the file LS_DS_212_assignment.ipynb in the GitHub repository, you'll apply your knowledge of advanced linear regression concepts:
Tasks:
- Import csv file using wrangle function
- Conduct exploratory data analysis (EDA), and modify wrangle function to engineer two new features
- Split data into feature matrix X and target vector y
- Split feature matrix X and target vector y into training and test sets
- Establish the baseline mean absolute error for your dataset
- Build and train a Linearregression model
- Calculate the training and test mean absolute error for your model
- Calculate the training and test R² score for your model
- Stretch Goal: Determine the three most important features for your linear regression model