Module 3: Permutation and Boosting

In this module, you'll learn about ensemble methods with a focus on bagging and boosting techniques. You'll understand how gradient boosting models work and how to interpret feature importances through both default and permutation methods. These techniques will help you build more powerful models and gain deeper insights into what drives your predictions.

Learning Objectives

1. Bagging vs. Boosting

Learn the differences between bagging and boosting approaches to ensemble learning and when to use each technique.

  • Understanding ensemble methods and their advantages
  • Comparing random forests (bagging) with gradient boosting
  • Identifying when bagging or boosting is more appropriate
  • Implementing both techniques with scikit-learn
  • Understanding how each approach handles bias and variance
  • Evaluating performance differences between methods

2. Gradient Boosting Model

Learn how gradient boosting models work and how to implement them effectively.

  • Understanding the mathematical principles of gradient boosting
  • Implementing gradient boosting with libraries like XGBoost
  • Tuning key hyperparameters for optimal performance
  • Preventing overfitting in gradient boosting models
  • Handling categorical features with gradient boosting
  • Interpreting gradient boosting outputs

3. Feature Importances (default and permutation)

Learn how to interpret model outputs through feature importance metrics to understand what drives your predictions.

  • Understanding default feature importance calculations
  • Implementing permutation importance to measure feature impact
  • Comparing built-in vs. permutation importance methods
  • Using feature importances to guide feature selection
  • Visualizing feature importances effectively
  • Understanding the limitations of importance measures

Guided Project

Permutation and Boosting Guided Project

Part Two Video

In this guided project, you'll work through a complete workflow for feature permutation importance and gradient boosting. Using a real-world dataset, you'll learn to measure feature importance through permutation and implement XGBoost models to improve prediction performance.

Module Assignment

Feature Engineering and Model Selection for Your Portfolio Project

For this assignment, you'll continue working with your portfolio dataset from previous modules. You'll apply what you've learned to engineer meaningful features and select appropriate models for your specific problem.

Note: There is no video for this assignment as you will be working with your own dataset and defining your own machine learning problem.

Assignment Notebook Name: LS_DS_233_assignment.ipynb

Tasks:

  1. If you haven't completed assignment #1, please do so first.
  2. Continue to clean and explore your data. Make exploratory visualizations.
  3. Fit a model. Does it beat your baseline?
  4. Try xgboost.
  5. Get your model's permutation importances.

Additional Resources