Final Practice Set – BC8 Practice Hub

Readme

Hey :)

This page is the last practice set. It brings together the main flow from the earlier practice pages in one place: a short warm-up, one mean comparison test, and then a longer regression block.

In Exercise 1, you import data.xlsx yourself with readxl.
From Exercise 2 onward, the file is already loaded and stored in df for you.
In the later exercises, df is freshly reloaded behind the scenes, so one mistake will not break the next task.
For the regression part, you will often return the full output first and then answer a multiple-choice question based on that output.

If you want the dataset on your own computer, download it here: data.xlsx.

Dataset description

The workbook data.xlsx contains data on customer support reps from one team. Each row is one rep.

Variable	Meaning	Values / categories
`rep_id`	Rep ID	`1, 2, ..., 50`
`practice_calls`	Number of practice calls completed that week	numeric values
`coaching`	Received coaching that week	0 = no, 1 = yes
`script`	Used the support script	0 = no, 1 = yes
`rating`	Quality rating score	numeric values

Functions you may need

You do not need all of these in every exercise, but these are the main functions used across the page.

library(readxl)
read_excel()
head()
mean()
sum()
boxplot()
qt()
t.test()
lm()
summary()
resid()
jarque.bera.test()

Useful operators you will probably use:

$
==
>
+
*
[ ]
<-

Quick guide

If the claim is that one group has a higher mean than the other, use alternative = “greater”.
For a full regression question, return summary(lm(…)) first.
A dummy variable such as script or coaching compares one group with the reference group.
An interaction term such as practice_calls * script checks whether the slope of practice_calls changes when script changes.
For the Jarque-Bera test, a large p-value means the residuals do not clearly contradict normality.

Exercise 1 - Import the workbook and show the data

Import data.xlsx into an object called df, then display the dataset.

Typing either df or head(df) is completely fine here.

Exercise 2 - Mean number of practice calls

df is already loaded for you in this exercise.

Find the mean of practice_calls.

Exercise 3 - Count reps who received coaching

df is already loaded for you in this exercise.

How many reps have coaching = 1?

Exercise 4 - Make a boxplot of practice calls

df is already loaded for you in this exercise.

Create a boxplot of practice_calls.

Use a horizontal boxplot if you can.

Exercise 5 - Choose the right test direction

You want to check whether reps who received coaching have a higher mean rating than reps who did not receive coaching.

Which alternative fits this claim?

Exercise 6 - Critical t-value

df is already loaded for you in this exercise.

Assume equal variances. Find the one-sided critical t value at the 5% level for the coaching-versus-no-coaching test.

Return just the number.

Exercise 7 - Run the one-sided t test

df is already loaded for you in this exercise.

Run the equal-variance two-sample t test to check whether reps who received coaching have a higher mean rating than reps who did not receive coaching.

Return the full t.test(…) output.

The order matters here because the claim is that the coaching group has the higher mean rating.

t.test(
  df$rating[df$coaching == 1],
  df$rating[df$coaching == 0],
  alternative = "greater",
  var.equal = TRUE
)

Exercise 8 - Read the t-test output

Based on the one-sided test from Exercise 7, which conclusion is correct at the 5% level?

Exercise 9 - Run the base regression

df is already loaded for you in this exercise.

Fit a regression where rating is explained by coaching, script, and practice_calls.

Return the full regression summary.

The full regression summary gives the coefficients, p-values, F-test, and adjusted R-squared in one output.

rating_model <- lm(rating ~ coaching + script + practice_calls, data = df)
summary(rating_model)

Exercise 10 - Read the overall F-test

Based on the base regression output from Exercise 9, which statement is correct?

Exercise 11 - Interpret the script coefficient

In the base regression, how should you interpret the coefficient on script?

Exercise 12 - Run the interaction model

df is already loaded for you in this exercise.

Fit a regression where rating is explained by practice_calls, script, coaching, and the interaction between practice_calls and script.

Return the full regression summary.

The interaction term checks whether the slope of practice_calls changes between the script = 0 and script = 1 groups.

interaction_model <- lm(rating ~ practice_calls * script + coaching, data = df)
summary(interaction_model)

Exercise 13 - Interpret the interaction term

Based on the interaction model, which statement is correct?

Exercise 14 - Jarque-Bera test for the base model

df is already loaded for you in this exercise.

Fit the base regression from Exercise 9, then run a Jarque-Bera normality test on its residuals.

Return the full jarque.bera.test(…) output.

Exercise 15 - Jarque-Bera test for the interaction model

df is already loaded for you in this exercise.

Fit the interaction model from Exercise 12, then run a Jarque-Bera normality test on its residuals.

Return the full jarque.bera.test(…) output.

Again, the normality check is for the model residuals, not the original data.

interaction_model <- lm(rating ~ practice_calls * script + coaching, data = df)
jarque.bera.test(resid(interaction_model))

Exercise 16 - Read the Jarque-Bera results

Based on the Jarque-Bera tests for the two models, which statement is correct?