Module 2: Linear Regression 2

Overview

In this module, you will dive deeper into linear regression. You'll learn about train-test splits, multiple regression, ordinary least squares, and the bias-variance tradeoff. These concepts will help you build more robust regression models and better understand model performance.

Learning Objectives

Objective 1: Understand Overfitting-Underfitting and Bias-Variance tradeoff

Learn about the fundamental concepts of model complexity and generalization.

  • Recognizing signs of overfitting and underfitting
  • Understanding bias and variance components of error
  • Balancing model complexity against generalization
  • Implementing strategies to mitigate overfitting

Objective 2: Implement a train-test split

Learn how to properly split your data into training and testing sets to evaluate model performance.

  • Understanding the importance of train-test splits
  • Using scikit-learn's train_test_split function
  • Setting a random state for reproducibility
  • Choosing appropriate train-test proportions

Objective 3: Fit and evaluate a Multiple Linear Regression model

Extend your regression skills to models with multiple features.

  • Understanding the differences between simple and multiple regression
  • Selecting and engineering features for multiple regression
  • Implementing multiple regression in scikit-learn
  • Evaluating model performance on training and test sets

Guided Project

Linear Regression II

The notebook for this guided project is JDS_SHR_212_guided_project_notes.ipynb in the GitHub repository.

Module Assignment

Linear Regression 2 Assignment

In this module assignment, found in the file LS_DS_212_assignment.ipynb in the GitHub repository, you'll apply your knowledge of advanced linear regression concepts:

Tasks:

  1. Import csv file using wrangle function
  2. Conduct exploratory data analysis (EDA), and modify wrangle function to engineer two new features
  3. Split data into feature matrix X and target vector y
  4. Split feature matrix X and target vector y into training and test sets
  5. Establish the baseline mean absolute error for your dataset
  6. Build and train a Linearregression model
  7. Calculate the training and test mean absolute error for your model
  8. Calculate the training and test R² score for your model
  9. Stretch Goal: Determine the three most important features for your linear regression model