Inference Practice Set – BC8 Practice Hub

Readme

Hey :)

This page is an inference practice set built around one simple business-style dataset. The goal is to practise the full flow calmly: import the data, compare two groups, choose the right test direction, run the test, and interpret the output.

In Exercise 1, you import the Excel file yourself with readxl.
From Exercise 2 onward, the dataset is already loaded as df for you.
In every later exercise, df is freshly reloaded behind the scenes, so one mistake will not break the next task.
In the later part of the page, you will sometimes run the full output first and then answer a multiple-choice question based on that output.

Dataset description

The workbook contains data from a campus cafe. Each row is one customer order.

Variable	Meaning	Values / categories
`order_id`	Order ID	`1, 2, ..., 50`
`items`	Number of items in the order	whole numbers
`express_line`	Whether the order used the express line	`"yes"` or `"no"`
`mobile_order`	Whether the order was placed on mobile first	`"yes"` or `"no"`
`wait_time`	Total wait time in minutes	numeric values

Functions you may need

You do not need all of these in every exercise, but these are the main functions and operators used across the whole page.

library(readxl)
read_excel()
head()
mean()
sum()
boxplot()
length()
qt()
t.test()

Useful operators you will probably use:

$
==
-
[ ]
<-

Quick guide

If the claim is that the express-line group has a lower mean wait_time, use a one-sided test with alternative = "less".
If the claim is only that the two groups are different, use alternative = "two.sided".
If equal variances are assumed, the degrees of freedom are n1 + n2 - 2.
A p-value smaller than 0.05 means you reject H0 at the 5% level.
A p-value between 0.05 and 0.10 is not significant at 5%, but it is significant at 10%.

Exercise 1 - Import the workbook and show the data

Import data.xlsx into an object called df, then display the dataset.

Typing either df or head(df) is completely fine here.

In the exam, the very first step is often just getting the data in correctly. Once df exists, printing it lets you check that the file was imported the way you expected.

library(readxl)
df <- read_excel("data.xlsx")
head(df)

Exercise 2 - Count orders in the express line

df is already loaded for you in this exercise.

How many orders used the express line?

Exercise 3 - Mean wait time for one group

Find the mean wait time for the orders that used the express line.

The mean should only be calculated for the express-line group. So first keep the wait_time values where express_line is "yes", and then average those values.

mean(df$wait_time[df$express_line == "yes"])

Exercise 4 - Mean wait time for the other group

Find the mean wait time for the orders that did not use the express line.

Exercise 5 - Difference between the two means

Use the two means from Exercises 3 and 4 and return their difference.

The difference here is mean_yes - mean_no, where mean_yes is the mean wait time for orders with express_line == "yes" and mean_no is the mean wait time for orders with express_line == "no". Writing the difference in this order makes the sign meaningful: a negative value means the express-line orders were faster in this dataset.

mean(df$wait_time[df$express_line == "yes"]) - mean(df$wait_time[df$express_line == "no"])

Exercise 6 - Boxplot of wait time by line

Create a boxplot of wait_time by express_line.

wait_time ~ express_line tells R to draw one box for each line group and place the wait time distribution inside each box. That is the quickest visual comparison of the two groups.

boxplot(wait_time ~ express_line, data = df)

Exercise 7 - Choose the right alternative

We want to test whether orders in the express line have a lower mean wait_time than the other orders.

Choose the correct alternative setting for t.test(...).

Exercise 8 - Degrees of freedom

Assume population variances are unknown but equal. What are the degrees of freedom for this two-sample t test?

Exercise 9 - Critical t-value

At alpha = 0.05, what is the critical t value for the left-tailed one-sided test?

Exercise 10 - Run the one-sided hypothesis test

Run the equal-variance two-sample t test to check whether orders in the express line have a lower mean wait_time than orders not in the express line.

Return the full t.test(...) output.

The first vector should be the express-line group because the claim is that this group has the lower mean wait time. var.equal = TRUE matches the equal-variance assumption, and alternative = "less" matches the direction of the claim.

t.test(
  df$wait_time[df$express_line == "yes"],
  df$wait_time[df$express_line == "no"],
  alternative = "less",
  var.equal = TRUE
)

Exercise 11 - Read the one-sided output

Based on the output from Exercise 10, which statement is correct at the 5% level?

Exercise 12 - Change the significance level

If you keep the same one-sided test output from Exercise 10 but use alpha = 0.10, what changes?

Exercise 13 - Run the two-sided version

Now run the equal-variance two-sample t test for the question whether the two mean wait times are different.

Return the full t.test(...) output.

The groups stay the same, and the equal-variance assumption stays the same. Only the research question changes: now you test whether the mean wait times differ in either direction.

t.test(
  df$wait_time[df$express_line == "yes"],
  df$wait_time[df$express_line == "no"],
  alternative = "two.sided",
  var.equal = TRUE
)

Exercise 14 - Read the two-sided output

Based on the output from Exercise 13, which statement is correct at the 5% level?