Exercise List 4 - Interactive

Hey you :)

How this list works:

one small task per code cell
one final output per code cell
hints + solutions available for each task
native R errors are shown when code fails

Quick guide: which formula do I need?

Sampling distribution of sample mean (`x̄`)

Mean of x̄: μ
If population σ is known:
- SE = σ / sqrt(n)
- probability: pnorm(cutoff, mean = μ, sd = σ / sqrt(n))

Sampling distribution of sample proportion (`p̂`)

E(p̂) = p
SE(p̂) = sqrt(p * (1 - p) / n)
probability: pnorm(cutoff, mean = p, sd = sqrt(p * (1 - p) / n))

Confidence interval for mean

σ known:
- x̄ +/- z* * σ / sqrt(n)
σ unknown:
- x̄ +/- t* * s / sqrt(n) (or use t.test(...))

Confidence interval for proportion

p̂ +/- z* * sqrt(p̂ * (1 - p̂) / n)

Required sample size

Mean (σ known): n = (z* * σ / E)^2
Proportion (no prior estimate): n = z*^2 * 0.25 / E^2
Always round up

7.2 The Sampling Distribution of the Sample Mean

Exercise 10

According to a survey, high school girls average 100 text messages daily. Assume that the population standard deviation is 20 text messages. Suppose a random sample of 50 high school girls is taken.

Exercise 10a

What is the probability that the sample mean is more than 105?

The sample mean x̄ is approximately normal with center 100 and standard error 20 / sqrt(50). Because the question asks for “more than 105”, you need the upper tail to the right of 105.

1 - pnorm(105, mean = 100, sd = 20 / sqrt(50))

Exercise 10b

What is the probability that the sample mean is less than 95?

This is again the sampling distribution of x̄, so the mean stays 100 but the spread becomes 20 / sqrt(50). Because the wording says “less than 95”, you use the lower tail at 95.

pnorm(95, mean = 100, sd = 20 / sqrt(50))

Exercise 10c

What is the probability that the sample mean is between 95 and 105?

“Between 95 and 105” means the area between two cutoffs. So first find the probability below 105, then subtract the probability below 95.

pnorm(105, mean = 100, sd = 20 / sqrt(50)) -
  pnorm(95, mean = 100, sd = 20 / sqrt(50))

Exercise 20

Suppose that IQ scores are normally distributed with a mean of 100 and a standard deviation of 16.

Exercise 20a

What is the probability that a randomly selected person will have an IQ score of less than 90?

This question is about one randomly selected person, not about an average. So you stay with the original normal distribution with mean 100 and standard deviation 16, and take the lower tail at 90.

pnorm(90, mean = 100, sd = 16)

Exercise 20b

What is the probability that the average IQ score of four randomly selected people is less than 90?

Now the question is about the average of 4 people, so the center stays 100 but the spread becomes smaller: 16 / sqrt(4). Then you again take the lower tail at 90.

pnorm(90, mean = 100, sd = 16 / sqrt(4))

Exercise 20c

If four people are randomly selected, what is the probability that all of them have an IQ score of less than 90?

7.3 The Sampling Distribution of the Sample Proportion

Exercise 25

A recent survey found that 82% of college graduates believe that their degree was a good investment (cnbc.com, February 27, 2020). Suppose a random sample of 100 college graduates is taken.

Exercise 25a-1

What is the expected value for the sampling distribution of the sample proportion?

Exercise 25a-2

What is the standard error for the sampling distribution of the sample proportion?

Exercise 25b

What is the probability that the sample proportion is less than 0.80?

You treat p̂ as approximately normal with mean 0.82 and standard error from 25a-2. Since the question asks for “less than 0.80”, you use the lower tail.

pnorm(0.80, mean = 0.82, sd = sqrt(0.82 * 0.18 / 100))

Exercise 25c

What is the probability that the sample proportion is within +/- 0.02 of the population proportion?

“Within +/- 0.02” means from 0.80 to 0.84. So the probability you want is the area between those two bounds: probability below 0.84 minus probability below 0.80.

pnorm(0.84, mean = 0.82, sd = sqrt(0.82 * 0.18 / 100)) -
  pnorm(0.80, mean = 0.82, sd = sqrt(0.82 * 0.18 / 100))

Exercise 28

At an exhibit in the Museum of Science, people are asked to choose between 50 and 100 random draws from a machine. The machine is known to have 60 green balls and 40 red balls. After each draw, the color of the ball is noted, and the ball is put back for the next draw. You win a prize if more than 70% of the draws result in a green ball. Would you choose 50 or 100 draws for the game. Explain.

Choose the better option.

8.1 Confidence interval for the population mean when `σ` is known

Exercise 15

(Highway_Speeds) A safety office is concerned about speeds on a certain section of the New Jersey Turnpike. The accompanying file contains the speeds of 40 cars on a Saturday afternoon. Assume that the population standard deviation is 5 mph. Construct the 95% confidence interval for the mean speed of all cars on that section of the turnpike. Are the safety officer’s concern valid if the speed limit is 55 mph? Explain.

Quick dataset note: in the code cells below, the file Highway_Speeds.xlsx is loaded into df. It contains one column called Highway Speeds, which stores the observed car speeds.

Exercise 15a

Return the lower bound of the 95% confidence interval.

Because σ is known, this is a z-based confidence interval for the mean. First compute the sample mean, then subtract the margin of error z* × 5 / sqrt(n) to get the lower endpoint.

mean(df[[1]]) - qnorm(0.975) * 5 / sqrt(nrow(df))

Exercise 15b

Return the upper bound of the 95% confidence interval.

This is the same confidence interval as in 15a. The only difference is that for the upper endpoint you add the margin of error instead of subtracting it.

mean(df[[1]]) + qnorm(0.975) * 5 / sqrt(nrow(df))

Exercise 15c

Are the safety officer’s concerns valid if the speed limit is 55 mph?

Choose one answer.

8.2 Confidence interval for the population mean when `σ` is unknown

Exercise 36

(Economics) An associate dean of a university wishes to compare the means on the standardized final exams in microeconomics and macroeconomics. He has access to a random sample of 40 scores from each of these two courses. A portion of the data is shown in the accompanying table.

Quick dataset note: in the code cells below, the file Economics.xlsx is loaded into df. It has two score columns: Micro for microeconomics and Macro for macroeconomics.

Exercise 36a

Construct the 95% confidence interval lower bound for the mean score in microeconomics.

Here σ is unknown, so you use a one-sample t interval. The function t.test(df$Micro) returns the full interval in conf.int, and the first element is the lower endpoint.

t.test(df$Micro)$conf.int[1]

Exercise 36b

Construct the 95% confidence interval upper bound for the mean score in microeconomics.

Exercise 36c

Construct the 95% confidence interval lower bound for the mean score in macroeconomics.

Exercise 36d

Construct the 95% confidence interval upper bound for the mean score in macroeconomics.

Exercise 36e

Explain why the widths of the two intervals are different.

Choose the statement that fits best.

8.3 Confidence interval for the population proportion

Exercise 54

One in five 18-year-old Americans has not graduated from high school. A mayor of a Northeastern city comments that its residents do not have the same graduation rate as the rest of the country. An analyst from the Department of Education decides to test the mayor’s claim. In particular, she draws a random sample of 80 18-year-old in the city and finds that 20 of them have not graduated from high school.

Exercise 54a

Compute the point estimate for the proportion of 18-year-olds who have not graduated from high school in this city.

Exercise 54b

Use this point estimate to derive the 95% confidence interval lower bound for the population proportion.

Start with p̂ = 20 / 80. Then use the confidence-interval formula for a proportion and subtract the margin of error to get the lower endpoint.

phat <- 20 / 80
phat - qnorm(0.975) * sqrt(phat * (1 - phat) / 80)

Exercise 54c

Use this point estimate to derive the 95% confidence interval upper bound for the population proportion.

Exercise 54d

Can the mayor’s comment be justified at 95% confidence?

Choose one answer.

8.4 Selecting the required sample size

Exercise 64

An analyst would like to construct 95% confidence intervals for the mean stock returns in two industries. Industry A is a high-risk industry with a known population standard deviation of 20.6%, whereas Industry B is a low-risk industry with a known population standard deviation of 12.8%

Exercise 64a

What is the minimum sample size required by the analyst if she wants to restrict the margin of error of 4% for Industry A?

Use the sample-size formula for a mean with known σ: n = (z* × σ / E)^2. Then round up, because sample size must be a whole number and you need at least that many observations.

ceiling((qnorm(0.975) * 20.6 / 4)^2)

Exercise 64b

What is the minimum sample size required by the analyst if she wants to restrict the margin of error to 4% for Industry B?

Exercise 64c

Why do the results differ if they use the same margin of error?

Choose the statement that fits best.

Exercise 71

A business student is interested in estimating the 99% confidence interval for the proportion of students who bring laptops to campus. He wants a precise estimate and is willing to draw a large sample that will keep the sample proportion within five percentage points of the population proportion. What is the minimum sample size required by this student, given that no prior estimate of the population proportion is available?

Because there is no prior estimate of the population proportion, you use the conservative choice p = 0.5. That gives the largest required sample size, so it is the safe planning choice.

ceiling((qnorm(0.995)^2 * 0.5 * 0.5) / 0.05^2)

Hey you :)

Quick guide: which formula do I need?

Sampling distribution of sample mean (x̄)

Sampling distribution of sample proportion (p̂)

Confidence interval for mean

Confidence interval for proportion

Required sample size

7.2 The Sampling Distribution of the Sample Mean

Exercise 10

Exercise 10a

Exercise 10b

Exercise 10c

Exercise 20

Exercise 20a

Exercise 20b

Exercise 20c

7.3 The Sampling Distribution of the Sample Proportion

Exercise 25

Exercise 25a-1

Exercise 25a-2

Exercise 25b

Exercise 25c

Exercise 28

8.1 Confidence interval for the population mean when σ is known

Exercise 15

Exercise 15a

Exercise 15b

Exercise 15c

8.2 Confidence interval for the population mean when σ is unknown

Exercise 36

Exercise 36a

Exercise 36b

Exercise 36c

Exercise 36d

Exercise 36e

8.3 Confidence interval for the population proportion

Exercise 54

Exercise 54a

Exercise 54b

Exercise 54c

Exercise 54d

8.4 Selecting the required sample size

Exercise 64

Exercise 64a

Exercise 64b

Exercise 64c

Exercise 71

Sampling distribution of sample mean (`x̄`)

Sampling distribution of sample proportion (`p̂`)

8.1 Confidence interval for the population mean when `σ` is known

8.2 Confidence interval for the population mean when `σ` is unknown