An introduction to data science
Learning objectives
This website serves as an introduction to Data Science for middle and high school students and teachers. Among the main topics, this website focuses on:
- Why Statistics is useful in the real world
- Statistics to understand genetic diversity
- Mean, variances, correlations and other summary measures from data
- Probability distributions
- Hypothesis tests (t tests, ANOVA, chi square tests)
For a list of specific learning objectives by lesson, visit this page.
How to use this website
This website
- is intended for middle and high school students learning to apply statistics for the first time,
- covers the basic statistical concepts and tests that are covered in an introductory statistics course,
- was developed to accompany the learning materials of Wisconsin Fast Plants, but the knowledge can be applied for any statistics introductory course
Each lesson
- covers a basic overview of concepts and aims to provide the theoretical take aways without being bogged down by the math,
- has resources for students to learn more about the math behind each concept, but the main purpose of the website is to teach how to think about statistics.
If you want more statistical information in a free textbook, check out the digital library called openstax and the textbook Free Introductory Business Statistics.
Table of Contents
| Topic | Key take-away points | Webpage |
|---|---|---|
| 1. Why data science? | The field of statistics is helpful for determing patterns versus random chance | page |
| 2. Genetic Diversity | DNA and its expression causes the wide variety among living things and physical traits we see in the world | page |
| 3. Averages and medians | Averages, or means, and medians are a fundamental metric to inform us about continuous variables in our data | page |
| 4. Variance and distributions | The spread of data is also valuable to determine if the spread is normal or skewed. Histograms are very helpful for us to visualize this | page |
| 5. Probability and z-scores | We can use probability and a standardized score of probabilty, z-scores, to inform us how rare and observation is | page |
| 6. Hypothesis testing | We generate ideas about patterns in the world, called hypothesis, that we test via statistics for build evidence for our idea | page |
| 7. Comparing 2 groups (t-tests) | We can compare if two groups are different using a statistical test called a t-test | page |
| 8. Comparing 2+ groups (ANOVA) | We can compare if two or more groups are different using a statistical test called an analysis of variance (ANOVA) test | page |
| 9. Comparing frequencies (chi-square) | We can compare if groups differ in frequency of observations or groups differ in a categorical variable (hair color) via a Chi-Square test | page |
| 10. Correlations | We can determine how much two continuous variables are related based on their correlation strength and the formula of how they relate with a regression equation | page |
| 11. Statistics in the real world | Data analysis is fast. Data cleaning can be lengthy and tricky with factors like power, effect size, and normality to consider | page |
AI-generated artwork within the website
Images on this website are a mix of original and AI-generated images. Images that have a row of multicolored squares are from Dalle2. Graphs and figures, unless noted, are generated with Rstudio which you can download here You can create all the graphs and figures without needing any data with this code. Feel free to adapt it for your own use!
