Consider a scenario where a data point is:
In this problem, you will be investigating the relationship between controlling and conditioning on a variable.
A datapoint will consist of:
Come up with a plausible model. Draw the causal graph and write down the data model equations. Generate fake data.
Using the fake data you generated, display \(P(\text{GPA}|\text{hours} \geq 8, \text{major} = \text{Math})\) and \(P(\text{GPA}|\text{hours} \geq 8, \text{major} = \text{Physics})\) (as histograms). Note the difference between the means. Now, set up a regression and predict the difference between those means using a regression.