Exercise List 7 - Interactive

Hey you :)

This list covers chi-square tests and one normality test. Take it one step at a time:

use the full test output when the task asks for it
read the tail direction carefully
for chi-square questions, keep track of the null distribution
for normality, focus on what the p-value says about the data

Packages used on this page: readxl.

Quick guide: which method do I need?

Goodness-of-fit for one multinomial distribution

Use chisq.test(observed, p = expected_proportions)
The null says the category proportions match the claimed distribution
The degrees of freedom are k - 1

Chi-square test for independence

Build a contingency table first
Then use chisq.test(table(...))
The null says the two variables are independent

Jarque-Bera test for normality

The null says the variable is normally distributed
The test statistic uses skewness and kurtosis
A large p-value means the data do not contradict normality

12.1 Goodness-of-Fit test for a Multinomial Experiment

Exercise 9

In 2003, the distribution of the world’s people worth $1 million or more was as follows:

Europe: 35.7%
North America: 31.4%
Asia Pacific: 22.9%
Latin America: 4.3%
Middle East: 4.3%
Africa: 1.4%

A recent sample of 500 global millionaires produces the following results:

Europe: 153
North America: 163
Asia Pacific: 139
Latin America: 20
Middle East: 20
Africa: 5
a. Test whether the distribution of millionaires today is different from the distribution in 2003 at α = 0.05.
b. Would the conclusion change if we tested it at α = 0.10?

Exercise 9a

Choose the correct hypotheses.

Exercise 9b

Run the chi-square goodness-of-fit test and return the full output.

This is a chi-square goodness-of-fit test because you compare one observed categorical sample with a claimed distribution.

observed <- c(153, 163, 139, 20, 20, 5)
proportion <- c(0.357, 0.314, 0.229, 0.043, 0.043, 0.014)
chisq.test(observed, p = proportion)

Exercise 9c

What is the correct conclusion at α = 0.05?

Exercise 9d

Would the conclusion change at α = 0.10?

12.2 Chi-Square test for independence

Exercise 24 (Happiness)

There have been numerous attempts that relate happiness with income. In a recent survey, 290 individuals were asked to evaluate their state of happiness (Happy or Not Happy) and income (Low, Medium, or High). The accompanying table shows a portion of the data.

a. Use the data to construct a contingency table.
b. Specify the competing hypotheses to determine whether happiness is related to income.
c. Conduct the test at the 5% significance level and make a conclusion.

Quick dataset note: in the code cells below, the file Happiness.xlsx is loaded into df. It has the columns Individual, Income, and Happy?.

Exercise 24a

Construct the contingency table.

Exercise 24b

Choose the correct hypotheses.

Exercise 24c

Run the chi-square test for independence and return the full output.

Exercise 24d

What is the correct conclusion at the 5% level?

12.3 Chi-Square tests for normality

Exercise 35 (MPG)

The accompanying data file shows miles per gallon (MPG) for a sample of 25 cars.

a. Using the Jarque-Bera test, state the competing hypotheses in order to determine whether or not MPG follows the normal distribution.
b. Calculate the value of the Jarque-Bera test statistic and the p-value.
c. At α = 0.05, can you conclude that MPG is not normally distributed?

Quick dataset note: in the code cells below, the file MPG.xlsx is loaded into df. It has one column called MPG.

Exercise 35a

Choose the correct hypotheses.

Exercise 35b

Calculate the Jarque-Bera test statistic.

Jarque-Bera uses the sample skewness and excess kurtosis. Once you compute those, plug them into the formula.

x <- df$MPG
n <- length(x)
m <- mean(x)
m2 <- mean((x - m)^2)
S <- mean((x - m)^3) / (m2^(3/2))
K <- mean((x - m)^4) / (m2^2) - 3
(n / 6) * (S^2 + K^2 / 4)

Exercise 35c

Calculate the p-value.

Both methods give the same p-value.

# Method 1: directly from the Jarque-Bera test
jarque.bera.test(df$MPG)$p.value

# Method 2: from the Jarque-Bera statistic
x <- df$MPG
n <- length(x)
m <- mean(x)
m2 <- mean((x - m)^2)
S <- mean((x - m)^3) / (m2^(3/2))
K <- mean((x - m)^4) / (m2^2) - 3
jb <- (n / 6) * (S^2 + K^2 / 4)
pchisq(jb, df = 2, lower.tail = FALSE)

Exercise 35d

At α = 0.05, what is the correct conclusion?