Exercise List 5 - Interactive

Hypothesis testing and inference between groups

<- Back to main page

Hey you :)

This list is longer, so take it step by step:

  • one small task per code cell
  • one final output per cell
  • read the task wording carefully (tail direction matters)
  • use hints if stuck, then retry

Quick guide: hypothesis testing flow

Step 1: write hypotheses correctly

  • Null includes equality (=, <=, >=)
  • Alternative is strict (<, >, !=)
  • Use population parameter symbols (μ, p), not sample statistics

Step 2: pick the correct tail

  • Ha: μ < value -> left tail
  • Ha: μ > value -> right tail
  • Ha: μ != value -> two-sided

Step 3: test + p-value

  • Mean, σ known: z test
  • Mean, σ unknown: t test
  • Proportion: z test
  • Difference in means: z or t depending on assumptions

Step 4: decision

  • If p-value < alpha: reject H0
  • If p-value >= alpha: do not reject H0

9.1 Introduction to Hypothesis Testing

Exercise 1

Explain why the following hypotheses are not constructed correctly

Exercise 1a

H0: μ <= 10; Ha: μ >= 10

Choose the best explanation.

Check whether the two hypotheses overlap. A valid pair should not both include the same boundary value.

Correct choice: the second option.

This pair is invalid because both hypotheses include the boundary value μ = 10. A valid null and alternative must not overlap. One side should contain the equality, and the other side should exclude it.

Exercise 1b

H0: μ != 500; Ha: μ = 500

Choose the best explanation.

In standard hypothesis testing, the equality sign belongs in the null hypothesis.

Correct choice: the second option.

The equality sign belongs in the null hypothesis. In standard hypothesis testing, H0 contains the benchmark value being tested, and Ha states the competing direction or difference.

Exercise 1c

H0: p <= 0.40; Ha: p > 0.42

Choose the best explanation.

Ask whether every possible value of p is covered by one of the two hypotheses.

Correct choice: the second option.

These hypotheses leave a gap between 0.40 and 0.42, so they do not cover all possible values of p. A valid pair must cover every possible value without gaps and without overlap.

Exercise 1d

H0: X <= 128; Ha: X > 128

Choose the best explanation.

Hypotheses should be about a population parameter such as μ or p, not a sample outcome.

Correct choice: the second option.

The problem is not the number 128. The problem is the symbol X. Hypotheses should be written about a population parameter such as μ or p, not about a sample observation.

Exercise 2

Which of the following statements are valid null and alternative hypotheses? If they are invalid hypotheses, explain why.

For each item below, choose whether the hypotheses are valid or invalid.

Exercise 2a

H0: X <= 210; Ha: X > 210

Choose one answer.

Check whether the symbol in the hypotheses is a population parameter or just a sample quantity.

Correct choice: Invalid.

This pair uses X, which is not the population parameter of interest. A proper hypothesis statement should be about a parameter such as μ or p.

Exercise 2b

H0: μ = 120; Ha: μ != 120

Choose one answer.

This is the standard two-sided setup: equality in H0, not-equal in Ha.

Correct choice: Valid.

This is a standard two-sided hypothesis pair. The null keeps the equality, and the alternative checks whether the mean is different from that value.

Exercise 2c

H0: p <= 0.24; Ha: p > 0.24

Choose one answer.

Check whether the null and alternative cover all possible values and whether equality is included in H0.

Correct choice: Valid.

This is a valid one-sided setup. The null includes the equality case, and together the null and alternative cover all possible values of p.

Exercise 2d

H0: μ < 252; Ha: μ > 252

Choose one answer.

Look for the equality case μ = 252. If that value is missing, the pair is incomplete.

Correct choice: Invalid.

The value μ = 252 is not included anywhere, so the hypotheses are incomplete. In a valid pair, the equality case must be part of the null hypothesis.

Exercise 7

Construct the null and alternative hypotheses for the following claims:

Exercise 7a

“I am going to get the majority of the votes to win this election”

Choose the best hypothesis pair.

The claim is about a proportion, and “majority” means more than 50%, so the alternative should point to p > 0.50.

Correct choice: the first option.

The claim is about a proportion because it concerns the share of votes. “Majority” means more than 0.50, so the alternative must be p > 0.50. That makes the corresponding null p <= 0.50.

Exercise 7b

“I suspect that your 10-inch pizzas are, on average, less than 10 inches in size”

Choose the best hypothesis pair.

The claim says “on average, less than 10”, so think mean μ and put the less-than direction into the alternative.

Correct choice: the first option.

This is a claim about an average pizza size, so the parameter is μ. The phrase “less than 10 inches” gives a left-tailed alternative, so Ha must be μ < 10.

Exercise 7c

“I will have to fine the company since its tablets do not contain an average of 250 mg of ibuprofen as advertised”

Choose the best hypothesis pair.

“Do not contain an average of 250” means the mean could be either below or above 250, so this is two-sided.

Correct choice: the first option.

The statement says the average is not 250 mg, which means the mean could be either below or above 250. That makes this a two-sided test with H0: μ = 250 and Ha: μ != 250.

Exercise 11

The screening process for detecting a rare disease is not perfect. Researchers have developed a blood test that is considered fairly reliable. It gives a positive reaction in 98% of the people who have that disease. However, it erroneously gives a positive reaction in 3% of the people who do not have the disease. Consider the null hypothesis “the individual does not have the disease” to answer the following questions.

Exercise 11a

What is the probability of a Type I error?

With H0 = “the person does not have the disease”, a Type I error means calling a healthy person positive.

A Type I error means rejecting H0 even though H0 is true. Here H0 says the person does not have the disease, so a Type I error is a false positive. The test gives a positive result to healthy people 3% of the time.

0.03

Exercise 11b

What is the probability of a Type II error?

Type II error here means the person has the disease but the test misses it, so use the complement of the 98% true-positive rate.

A Type II error means failing to reject H0 even though it is false. Here that means the person really has the disease, but the test misses it. Since the true-positive rate is 0.98, the false-negative rate is 1 - 0.98 = 0.02.

0.02

Exercise 11c

Choose whether this summary is correct: “Type I: healthy person tests positive. Type II: diseased person tests negative.”

Type I is a false positive, and Type II is a false negative.

Correct choice: Correct summary.

A Type I error is a false positive: the person is healthy, but the test says positive. A Type II error is a false negative: the person has the disease, but the test says negative. So the summary matches both definitions correctly.

Exercise 11d

What is wrong with the nurse’s analysis, “The blood test result has proved that the individual is free of disease”?

Choose the best explanation.

Even a negative result can still be wrong if the test sometimes misses diseased people.

Correct choice: the second option.

A negative test result does not prove with certainty that the person is disease-free. The reason is Type II error: sometimes a person with the disease still tests negative.

9.2 Hypothesis test for the population mean when σ is known

Exercise 29

(Hourly_Wage) The data accompanying this exercise shows hourly wages (in $) for 50 employees. An economist wants to test if the average hourly wage is less than $22. Assume that the population standard deviation is $6.

Quick dataset note: in the code cells below, the file Hourly_Wage.xlsx is loaded into df. It contains the columns Wage, EDUC, EXPER, AGE, and Male. For this exercise, you only need the Wage column.

Exercise 29a

State the null and alternative hypotheses.

Choose the best hypothesis pair.

The economist’s claim is “less than $22”, so the alternative must point below 22.

Correct choice: the first option.

The claim is that the average wage is less than $22, so this is a left-tailed test. The alternative must therefore be Ha: μ < 22, and the null must contain the equality case. That gives H0: μ >= 22.

Exercise 29b

Find the value of the test statistic.

This is a one-sample z test with known σ = 6. Take the sample mean, subtract the hypothesized value, and divide by the standard error σ / sqrt(n).

For a one-sample z test with known σ, the test statistic is

  • sample mean minus hypothesized mean
  • divided by σ / sqrt(n)

So here you compare the sample mean wage with 22 and scale that difference by the known standard error.

(mean(df$Wage) - 22) / (6 / sqrt(nrow(df)))

Exercise 29c

Find the p-value.

Because Ha says “less than”, this is a left-tailed test. Use the lower-tail probability for your test statistic.

The alternative is “less than 22”, so this is a left-tailed test. Once you have the z statistic, the p-value is the probability of getting a value that small or smaller under the null.

pnorm(z, lower.tail = TRUE)

Exercise 29d

At alpha = 0.05, what is the conclusion? Is the average hourly wage less than $22?

Choose one answer.

Use the p-value from 29c and compare it with 0.05.

Correct choice: Reject H0.

Once you have the p-value from the previous step, the decision is mechanical: compare it with 0.05. Here the p-value is smaller than 0.05, so the evidence is strong enough to reject the null hypothesis at the 5% level.

9.3 Hypothesis test for the population mean when σ is unknown

Exercise 50

(MPG) The data accompanying this exercise shows miles per gallon (MPG) for 25 “supergreen” cars.

Quick dataset note: in the code cells below, the file MPG.xlsx is loaded into df. It has one column called MPG, which stores the miles per gallon values.

Exercise 50a

State the null and the alternative hypotheses in order to test whether the average MPG differs from 95.

Choose the best hypothesis pair.

“Differs from” means the mean could be either above or below 95, so this is two-sided.

Correct choice: the first option.

The phrase “differs from 95” means the true mean could be above or below 95. That is a two-sided test, so the null uses equality and the alternative uses not-equal.

Exercise 50b

Run the full one-sample t test for whether the average MPG differs from 95.

Return the full t.test(...) output.

Use the MPG column, test against mu = 95, and keep the default two-sided setup.

Because the question asks whether the mean MPG differs from 95, this is a one-sample two-sided t test. You give the sample data, the hypothesized mean, and let t.test(...) return the full output.

t.test(df$MPG, mu = 95)

Exercise 50c

Based on the output from 50b, which statement about the p-value is correct?

Read the p-value directly from the t.test(...) output. Then compare it with both 0.05 and 0.10.

Correct choice: the third option.

The p-value is about 0.484, which is much larger than both 0.05 and 0.10. So the result is not statistically significant at either of those common levels.

Exercise 50d

At alpha = 0.05, can you conclude that the average MPG differs from 95?

Choose one answer.

Use the p-value from the test output in 50b. Then compare it with 0.05.

Correct choice: Do not reject H0.

At this step you only compare the p-value with 0.05. Because the p-value is larger than 0.05, the result is not statistically significant at the 5% level, so you do not reject the null hypothesis.

9.4 Hypothesis test for the population proportion

Exercise 64

An economist is concerned that more than 20% of American households have raided their retirement accounts to endure financial hardships such as unemployment and medical emergencies. He randomly surveys 190 households with retirement accounts and finds that 50 are borrowing against them.

Exercise 64a

Set up the null and alternative hypotheses to test the economist’s concern.

Choose the best hypothesis pair.

The economist is worried that the true proportion is above 20%, so the alternative must point to the right.

Correct choice: the first option.

The economist is worried that the true proportion is above 0.20, so the alternative must point to the right: Ha: p > 0.20. The null keeps the equality case and everything on the other side.

Exercise 64b

Calculate the value of the test statistic.

For a one-sample proportion test, start with p̂ = 50 / 190. Then compare to the null value 0.20 using the standard error built from p0.

A one-sample proportion z statistic compares the sample proportion with the null value p0. The denominator uses the null standard error sqrt(p0(1-p0)/n), not the sample standard deviation.

phat <- 50 / 190
(phat - 0.20) / sqrt(0.20 * 0.80 / 190)

Exercise 64c

Calculate the p-value.

This is a right-tailed proportion test, so use the upper-tail probability for the z statistic.

The economist’s claim is right-tailed, so once you have the z statistic you take the upper-tail probability. That gives the chance of seeing a result at least this large if the true proportion were really 0.20.

phat <- 50 / 190
z <- (phat - 0.20) / sqrt(0.20 * 0.80 / 190)
pnorm(z, lower.tail = FALSE)

Exercise 64d

Determine if the economist’s concern is justifiable at alpha = 0.05.

Choose one answer.

Use the p-value from 64c and compare it with 0.05.

Correct choice: Reject H0.

You use the p-value from the previous step and compare it with 0.05. Since the p-value is below 0.05, the result is statistically significant, so the concern is supported at the 5% level.

10.1 Inference concerning the difference between two means

Exercise 17

(Longevity) A consumer advocate researches the length of life between two brands of refrigerators, Brand A and Brand B. He collects data (measured in years) on the longevity of 40 refrigerators for Brand A and repeats the sampling for Brand B. A portion of the data is shown in the accompanying table.

Quick dataset note: in the code cells below, the file Longevity.xlsx is loaded into df. It has two columns: Brand A and Brand B, each containing the observed lifetimes in years.

Exercise 17a

Specify the competing hypotheses to test whether the average length of life differs between the two brands.

Choose the best hypothesis pair.

“Differs” means the difference could be positive or negative, so the alternative should be two-sided around zero.

Correct choice: the first option.

The question asks whether the two mean lifetimes differ, without saying one should be larger than the other. That makes this a two-sided test for the difference in means, centered at zero.

Exercise 17b

Calculate the value of the test statistic. Assume that σ²A = 4.4 and σ²B = 5.2.

This is a two-sample test for a difference in means with known variances. First compute the difference in sample means, then divide by the standard error for that difference.

Here you are standardizing the observed difference in sample means.

The top of the formula is:

  • sample mean of Brand A
  • minus sample mean of Brand B

The bottom of the formula is the standard error for the difference in two means when the population variances are known:

  • sqrt(σ²A / nA + σ²B / nB)

That is why the numbers 4.4 and 5.2 appear inside the square root, each divided by its sample size. One possible R answer is:

(mean(df$`Brand A`) - mean(df$`Brand B`)) / sqrt(4.4 / nrow(df) + 5.2 / nrow(df))

Exercise 17c

Calculate the p-value.

Because the alternative is two-sided, your p-value must include both tails.

Because the alternative is two-sided, you must count both tails. So you first take the upper-tail probability beyond |z|, then multiply by 2.

2 * pnorm(abs(z), lower.tail = FALSE)

Exercise 17d

At the 5% significance level, what is the conclusion?

Choose one answer.

Use the p-value from 17c and compare it with 0.05.

Correct choice: Do not reject H0.

The decision depends on whether the p-value is smaller than 0.05. Here it is not, so the data do not provide enough evidence of a difference between the two mean lifetimes at the 5% level.

Exercise 20

(Tractor_Times) The production department at Greenside Corporation, a manufacturer of lawn equipment, has devised a new manual assembly method for its lawn tractors. Now it wishes to determine if it is reasonable to conclude that the mean assembly time of the new method is less than the old method. Accordingly, they have randomly sampled assembly time (in minutes) from the 40 tractors using the old method and 32 tractors using the new method. A portion of the data is shown in the accompanying table.

Quick dataset note: in the code cells below, the file Tractor_Times.xlsx is loaded into df. It has two columns: Old for the old assembly method and New for the new assembly method.

Exercise 20a

Set up the hypotheses.

Choose the best hypothesis pair.

If the new method is faster, its mean assembly time should be lower than the old method’s mean.

Correct choice: the first option.

The claim is that the new method is faster, which means the new mean assembly time should be lower than the old mean. So for μ_new - μ_old, the alternative must be negative.

Exercise 20b

Run the full unequal-variance t test for whether the new method has a lower mean assembly time than the old method.

Return the full t.test(...) output.

Use the new-method times as the first input, the old-method times as the second input, set alternative = "less", and keep var.equal = FALSE.

Because the claim is that the new method is faster, the new-method times should be the first group and the old-method times the second group. The test is one-sided ("less") and uses unequal variances, so this is the Welch version of the two-sample t test.

t.test(df$New, df$Old, alternative = "less", var.equal = FALSE)

Exercise 20c

Based on the output from 20b, which statement about the p-value is correct?

Read the p-value directly from the t.test(...) output. Then compare it with both 0.05 and 0.10.

Correct choice: the second option.

The p-value is larger than 0.05, so the result is not significant at the 5% level. But it is still smaller than 0.10, so it does become significant at the 10% level.

Exercise 20d

At the 5% significance level, what is the conclusion?

Choose one answer.

This conclusion depends only on whether the p-value from 20c is smaller than 0.05.

Correct choice: Do not reject H0.

At the 5% significance level, the p-value is still too large to reject the null hypothesis. So there is not enough evidence, at this level, to conclude that the new method is faster.

Exercise 20e

What if the significance level is 10%?

Choose one answer.

Use the same p-value as before, but now compare it with 0.10.

Correct choice: Reject H0.

The p-value itself has not changed. What changes is the cutoff. At the 10% level, that same p-value is now small enough to reject the null hypothesis.

Exercise 21

(Nicknames) Baseball has always been a favorite pastime in America and is rife with statistics and theories. One study found that major league players who have nicknames live an average of 2 1/2 years longer than those without them. You do not believe in this result and decide to collect data on the lifespan of 30 baseball players along with a nickname variable that equals 1 if the player had a nickname and 0 otherwise. A portion of the data is shown in the accompanying table.

Quick dataset note: in the code cells below, the file Nicknames.xlsx is loaded into df. It contains Years for lifespan and Nickname, where 1 means the player had a nickname and 0 means the player did not.

Exercise 21a

Create two subsamples and return the average longevity for players with nicknames (Nickname == 1).

First isolate the players with Nickname == 1, then take the mean of their Years.

The task asks for the mean lifetime only for players with nicknames. So you first filter to Nickname == 1, then take the mean of the Years variable.

mean(df$Years[df$Nickname == 1])

Exercise 21b

Return the average longevity for players without nicknames (Nickname == 0).

Now isolate the players with Nickname == 0, then take the mean of their Years.

This is the same idea as 21a, but now for players without nicknames. So you filter to Nickname == 0 and then compute the mean of Years.

mean(df$Years[df$Nickname == 0])

Exercise 21c

Specify hypotheses to contradict the original claim.

Choose the best hypothesis pair.

The original claim is a specific difference of 2.5 years, so the null should keep that claimed value and the alternative should test whether the true difference is different.

Correct choice: the first option.

The original claim is a very specific one: players with nicknames live 2.5 years longer on average. To test that exact claim, the null must keep 2.5 as the benchmark difference, and the alternative asks whether the true difference is different from 2.5.

Exercise 21d

Run the full equal-variance two-sample t test for the claim about a 2.5-year difference.

Return the full t.test(...) output.

Use the two groups you created, keep mu = 2.5, and set var.equal = TRUE.

The original claim is about a difference of 2.5 years, so that value stays in mu. Because the exercise says to assume equal variances, you include var.equal = TRUE and return the full t.test(...) output.

t.test(with_nick, without_nick, mu = 2.5, var.equal = TRUE)

Exercise 21e

Based on the output from 21d, which statement about the p-value is correct?

Read the p-value directly from the t.test(...) output. Then compare it with both 0.05 and 0.10.

Correct choice: the third option.

The p-value is about 0.532, which is larger than both 0.05 and 0.10. So the result is not statistically significant at either level.

Exercise 21f

What is the conclusion of the test using 5% level of significance?

Choose one answer.

Use the p-value from the test output in 21d and compare it with 0.05.

Correct choice: Do not reject H0.

You compare the p-value from the previous step with 0.05. Because the p-value is larger than 0.05, there is not enough evidence against the original 2.5-year claim at the 5% level.