Exercise List 6 - Interactive

Hey you :)

This list covers paired differences, two proportions, one variance, and two variances. Take it one small step at a time:

one code cell or one choice at a time
one final result per task
read the tail direction carefully
if you see a full test output, use it to answer the question below

Packages used on this page: readxl and EnvStats.

Quick guide: which method do I need?

Paired before/after data

Same people measured twice -> use a paired t test
In R: t.test(df$After, df$Before, paired = TRUE)
The confidence interval is about the mean of the differences

Difference between two proportions

Start with p̂1 and p̂2
If the null difference is 0, use the pooled proportion in the standard error
If the null difference is a nonzero value such as 0.10, keep that value in the numerator

One population variance

Test statistic: χ² = (n - 1)s² / σ0²
Use pchisq(...) for the p-value
Use qchisq(...) for a confidence interval for the variance

Ratio of two variances

Test statistic: F = s1² / s2²
In R, var.test(...) gives the full test output
Use qf(...) for a confidence interval for σ1² / σ2²

10.2 Inference concerning mean differences

Exercise 39 (Smoking)

It is fairly common for people to put on weight when they quit smoking. While a small weight gain is normal, excessive weight gain can create new health concerns that erode the benefits of not smoking. The accompanying table shows a portion of the weight data for 50 women before quitting and six months after quitting.

Woman	Before	After
1	140	155
2	144	142
3	138	153
4	145	146
5	118	129
6	150	149

Quick dataset note: in the code cells below, the file Smoking.xlsx is loaded into df. It has the columns Woman, Before, and After.

Exercise 39a

Construct the 95% confidence interval for the mean gain in weight.

Return the full paired t.test(...) output.

Because the same women appear in both columns, this is a paired t test. That output gives you the estimated mean difference and the 95% confidence interval for that mean difference.

t.test(df$After, df$Before, paired = TRUE)

Exercise 39b

Which interval matches the 95% confidence interval for the mean gain in weight?

Exercise 39c

Use the confidence interval to decide whether the mean gain in weight differs from 5 pounds.

10.3 Inference concerning the difference between two proportions

Exercise 57

A report suggests that business majors spend the least amount of time on course work than all other college students. A provost of a university decides to count a survey where students are asked if they study hard, defined as spending at least 20hrs per week on course work. Of 120 business majors included in the survey, 20 said they had studied hard, as compared to 48 out of 150 nonbusiness majors who said that they studied hard. At the 5% significance level, can we conclude that the proportion of business majors who study hard is less than that of nonmajors? Provide the details.

Exercise 57a

Choose the correct hypotheses.

Exercise 57b

Calculate the value of the z test statistic.

A difference-in-proportions z test compares p̂1 - p̂2 with the null value 0. Because the null difference is zero, the standard error uses the pooled proportion from the two samples.

p1_hat <- 20 / 120
p2_hat <- 48 / 150
p_pool <- (20 + 48) / (120 + 150)
(p1_hat - p2_hat) / sqrt(p_pool * (1 - p_pool) * (1 / 120 + 1 / 150))

Exercise 57c

Find the p-value.

Exercise 57d

At the 5% significance level, what is the correct conclusion?

Exercise 58

Many believe that it is not feasible for men and women to be just friends, while others argue that this belief may not be true anymore since gone are the days when men worked, and women stayed at home and the only way they could get together was for romance. In a recent survey, 200 heterosexual college students were asked if it was feasible for male and female students to be just friends. Thirty-two percent of females and 57% of males reported that it was not feasible for men and women to be just friends. Suppose the study consisted of 100 female and 100 male students. At the 5% significance level, can we conclude that there is a greater than 10 percentage point difference between the proportion of male and female students with this view? Provide the details.

Exercise 58a

Choose the correct hypotheses.

Exercise 58b

Calculate the value of the z test statistic.

Because the null value for the difference is 0.10, the numerator is (p̂_male - p̂_female) - 0.10. The standard error is built from the two sample proportions and the two sample sizes.

((0.57 - 0.32) - 0.10) / sqrt((0.57 * (1 - 0.57) / 100) + (0.32 * (1 - 0.32) / 100))

Exercise 58c

Find the p-value.

Exercise 58d

At the 5% significance level, what is the correct conclusion?

11.1 Inference concerning the population variance

Exercise 17 (MPG)

The data accompanying this exercise show miles per gallon (mpg) for 25 cars.

Quick dataset note: in the code cells below, the file MPG.xlsx is loaded into df. It has one column called MPG.

Exercise 17a

State the null and the alternative hypotheses in order to test whether the variance differs from 62 mpg².

Exercise 17b

Assuming that MPG is normally distributed, calculate the value of the test statistic.

For one population variance, the solution list uses varTest(...) from EnvStats. The test statistic is the chi-square value inside that output. The manual formula gives the same result.

# Method 1: use the EnvStats test function
varTest(df$MPG, alternative = "two.sided", sigma.squared = 62, conf.level = 0.99)$statistic

# Method 2: calculate the same statistic directly
x <- df$MPG
(length(x) - 1) * var(x) / 62

Exercise 17c

Find the p-value.

The one-sample variance test output already contains the p-value, and the manual chi-square route gives the same value.

# Method 1: use the EnvStats test function
varTest(df$MPG, alternative = "two.sided", sigma.squared = 62, conf.level = 0.99)$p.value

# Method 2: compute the same p-value manually
x <- df$MPG
stat <- (length(x) - 1) * var(x) / 62
2 * min(pchisq(stat, df = length(x) - 1), 1 - pchisq(stat, df = length(x) - 1))

Exercise 17d

Make a conclusion at α = 0.01.

Exercise 17e1

Calculate the lower bound of the 95% confidence interval for the population variance.

The confidence interval for a variance comes from the chi-square distribution. The lower bound uses the larger chi-square cutoff in the denominator, which makes the lower endpoint smaller.

x <- df$MPG
(length(x) - 1) * var(x) / qchisq(0.975, df = length(x) - 1)

Exercise 17e2

Calculate the upper bound of the 95% confidence interval for the population variance.

The upper endpoint uses the smaller chi-square cutoff in the denominator. That makes the fraction larger and gives the upper bound of the interval.

x <- df$MPG
(length(x) - 1) * var(x) / qchisq(0.025, df = length(x) - 1)

11.2 Inference concerning the ratio of two population variances

Exercise 26

Consider the following measures based on independently drawn samples from normally distributed populations:

Sample 1: s1² = 220 and n1 = 20

Sample 2: s2² = 196 and n2 = 15

Exercise 26a1

Construct the 95% interval estimate for the ratio of the population variances.

Return the lower bound.

Exercise 26a2

Construct the 95% interval estimate for the ratio of the population variances.

Return the upper bound.

Exercise 26b

Using the confidence interval from part (a), test if the ratio of the population variances differs from 1 at the 5% significance level.

Exercise 38 (Rentals)

The data accompanying this exercise include monthly rents for a two-bedroom apartment in two campus towns. At the 5% significance level, test if the variance of rent in campus town 1 is less than the variance of rent in campus town 2. State your assumptions clearly.

Quick dataset note: in the code cells below, the file Rentals.xlsx is loaded into df. It has two columns called Town1 and Town2.

Exercise 38a

Which statement gives the right setup and assumptions?

Exercise 38b

Run the full variance-ratio test in R.

Return the full var.test(...) output.

The claim is that town 1 has the smaller variance, so an equivalent way to code the test is to place Town2 first and Town1 second and test whether the ratio is greater than 1. Returning the full var.test(...) output lets you read both the F statistic and the p-value directly.

var.test(df$Town2, df$Town1, alternative = "greater")

Exercise 38c

Based on the output from 38b, which statement about the p-value is correct?

Exercise 38d

At the 5% significance level, what is the correct conclusion?