Final Practice Set
A short warm-up, one mean test, and a longer regression block
Readme
Hey :)
This page is the last practice set. It brings together the main flow from the earlier practice pages in one place: a short warm-up, one mean comparison test, and then a longer regression block.
- In Exercise 1, you import
data.xlsxyourself withreadxl. - From Exercise 2 onward, the file is already loaded and stored in df for you.
- In the later exercises, df is freshly reloaded behind the scenes, so one mistake will not break the next task.
- For the regression part, you will often return the full output first and then answer a multiple-choice question based on that output.
If you want the dataset on your own computer, download it here: data.xlsx.
Dataset description
The workbook data.xlsx contains data on customer support reps from one team. Each row is one rep.
| Variable | Meaning | Values / categories |
|---|---|---|
rep_id |
Rep ID | 1, 2, ..., 50 |
practice_calls |
Number of practice calls completed that week | numeric values |
coaching |
Received coaching that week | 0 = no, 1 = yes |
script |
Used the support script | 0 = no, 1 = yes |
rating |
Quality rating score | numeric values |
Functions you may need
You do not need all of these in every exercise, but these are the main functions used across the page.
library(readxl)read_excel()head()mean()sum()boxplot()qt()t.test()lm()summary()resid()jarque.bera.test()
Useful operators you will probably use:
$==>+*[ ]<-
Quick guide
- If the claim is that one group has a higher mean than the other, use alternative = “greater”.
- For a full regression question, return summary(lm(…)) first.
- A dummy variable such as script or coaching compares one group with the reference group.
- An interaction term such as practice_calls * script checks whether the slope of practice_calls changes when script changes.
- For the Jarque-Bera test, a large p-value means the residuals do not clearly contradict normality.
Exercise 1 - Import the workbook and show the data
Import data.xlsx into an object called df, then display the dataset.
Typing either df or head(df) is completely fine here.
Load readxl, import data.xlsx into df, and then print df or head(df).
Start by making sure the file is loaded correctly and stored in df.
library(readxl)
df <- read_excel("data.xlsx")
head(df)Exercise 2 - Mean number of practice calls
df is already loaded for you in this exercise.
Find the mean of practice_calls.
Use mean() on the practice_calls column.
This is a direct mean question, so mean(df$practice_calls) is enough.
mean(df$practice_calls)Exercise 3 - Count reps who received coaching
df is already loaded for you in this exercise.
How many reps have coaching = 1?
Use sum(df$coaching == 1).
Because df$coaching == 1 gives TRUE/FALSE, sum(...) counts the TRUE values.
sum(df$coaching == 1)Exercise 4 - Make a boxplot of practice calls
df is already loaded for you in this exercise.
Create a boxplot of practice_calls.
Use a horizontal boxplot if you can.
Use boxplot(df$practice_calls, horizontal = TRUE).
This lets you check quickly whether there are any obvious outliers in practice_calls.
boxplot(df$practice_calls, horizontal = TRUE)Exercise 5 - Choose the right test direction
You want to check whether reps who received coaching have a higher mean rating than reps who did not receive coaching.
Which alternative fits this claim?
Exercise 6 - Critical t-value
df is already loaded for you in this exercise.
Assume equal variances. Find the one-sided critical t value at the 5% level for the coaching-versus-no-coaching test.
Return just the number.
Use degrees of freedom n1 + n2 - 2, then use qt(...) for the right tail.
The claim is “higher”, so you want the positive one-sided cutoff.
n1 <- sum(df$coaching == 1)
n0 <- sum(df$coaching == 0)
qt(0.95, df = n1 + n0 - 2)Exercise 7 - Run the one-sided t test
df is already loaded for you in this exercise.
Run the equal-variance two-sample t test to check whether reps who received coaching have a higher mean rating than reps who did not receive coaching.
Return the full t.test(…) output.
Use the coaching group as the first input, the no-coaching group as the second input, set alternative = "greater", and include var.equal = TRUE.
The order matters here because the claim is that the coaching group has the higher mean rating.
t.test(
df$rating[df$coaching == 1],
df$rating[df$coaching == 0],
alternative = "greater",
var.equal = TRUE
)Exercise 8 - Read the t-test output
Based on the one-sided test from Exercise 7, which conclusion is correct at the 5% level?
Exercise 9 - Run the base regression
df is already loaded for you in this exercise.
Fit a regression where rating is explained by coaching, script, and practice_calls.
Return the full regression summary.
Use lm(rating ~ coaching + script + practice_calls, data = df) and wrap it in summary(...).
The full regression summary gives the coefficients, p-values, F-test, and adjusted R-squared in one output.
rating_model <- lm(rating ~ coaching + script + practice_calls, data = df)
summary(rating_model)Exercise 10 - Read the overall F-test
Based on the base regression output from Exercise 9, which statement is correct?
Exercise 11 - Interpret the script coefficient
In the base regression, how should you interpret the coefficient on script?
Exercise 12 - Run the interaction model
df is already loaded for you in this exercise.
Fit a regression where rating is explained by practice_calls, script, coaching, and the interaction between practice_calls and script.
Return the full regression summary.
Use practice_calls * script inside lm(...), then wrap the model in summary(...).
The interaction term checks whether the slope of practice_calls changes between the script = 0 and script = 1 groups.
interaction_model <- lm(rating ~ practice_calls * script + coaching, data = df)
summary(interaction_model)Exercise 13 - Interpret the interaction term
Based on the interaction model, which statement is correct?
Exercise 14 - Jarque-Bera test for the base model
df is already loaded for you in this exercise.
Fit the base regression from Exercise 9, then run a Jarque-Bera normality test on its residuals.
Return the full jarque.bera.test(…) output.
Fit the model first, then use jarque.bera.test(resid(model_name)).
The test should be run on the residuals, not on the original rating column.
rating_model <- lm(rating ~ coaching + script + practice_calls, data = df)
jarque.bera.test(resid(rating_model))Exercise 15 - Jarque-Bera test for the interaction model
df is already loaded for you in this exercise.
Fit the interaction model from Exercise 12, then run a Jarque-Bera normality test on its residuals.
Return the full jarque.bera.test(…) output.
Fit the interaction model first, then use jarque.bera.test(resid(model_name)).
Again, the normality check is for the model residuals, not the original data.
interaction_model <- lm(rating ~ practice_calls * script + coaching, data = df)
jarque.bera.test(resid(interaction_model))Exercise 16 - Read the Jarque-Bera results
Based on the Jarque-Bera tests for the two models, which statement is correct?