Module 2: Hypothesis Testing (chi-square tests)

Introduction

In Module 1, we focused on hypothesis testing with t-tests, which are used for continuous data. In this module, we'll explore hypothesis testing with chi-square tests, which are used for categorical data.

Chi-square tests allow us to determine whether there is a significant association between categorical variables or whether observed categorical data matches what we would expect under a certain hypothesis. These tests are essential when analyzing survey responses, demographic information, or any data where variables are measured in categories rather than continuous values.

Learning Objectives

By the end of this module, you should be able to:

  • Explain the purpose of chi-square tests and identify applications
  • Distinguish between chi-square tests for goodness of fit and tests for independence
  • Set up and run a chi-square test for goodness of fit
  • Set up and run a chi-square test for independence
  • Interpret the results of chi-square tests
  • Understand the assumptions and limitations of chi-square tests
  • Apply appropriate post-hoc tests for chi-square analysis

Guided Project

Project Resources

Open DS_Unit1_Sprint2_Chi_Square_Tests.ipynb in the GitHub repository below to follow along with the guided project:

GitHub: Chi-Square Tests

Key Concepts

  • Chi-square distribution and test statistic
  • Chi-square test for goodness of fit: comparing a sample to a known distribution
  • Chi-square test for independence: testing relationships between categorical variables
  • Degrees of freedom in chi-square tests
  • Contingency tables and expected frequencies
  • Post-hoc analysis for chi-square tests
  • Using Python for chi-square analysis

Module Project

Project Tasks

In this module's project, you will be asked to:

  • Formulate null and alternative hypotheses for chi-square tests
  • Create and analyze contingency tables for categorical data
  • Perform chi-square tests for goodness of fit
  • Perform chi-square tests for independence
  • Calculate expected frequencies and chi-square test statistics
  • Interpret p-values in the context of your chi-square tests
  • Conduct post-hoc analyses to identify specific categories driving significant chi-square results

Complete all tasks in the Jupyter notebook provided in the GitHub repository.

Additional Resources