Descriptive Practice Set

One exam-style dataset using the main functions from Lists 1 and 2

<- Back to main page

Readme

Hey :)

You do not need to get everything right on the first try. This page is a descriptive practice set built around one exam-style dataset.

It is not a direct copy of Lists 1 and 2. Instead, it brings together the main functions and question types from both lists, so you can practise the core descriptive tools in one place.

  • In Exercise 1, you import the Excel file yourself with readxl.
  • From Exercise 2 onward, the dataset is already loaded as df for you.
  • For the later exercises, df is freshly reloaded each time, so if something goes wrong in one exercise, the next one still starts clean.
  • In the real exam, you would still need to do the import yourself, so Exercise 1 is good practice.

Dataset description

The workbook contains data on hotel guest experiences. Each row represents one guest. Take a quick breath before you start and have a look at the variables below :)

Variable Meaning Values / categories
guest Guest ID 1, 2, ..., 42
cleanliness Rating of room cleanliness 1 to 5
staff Rating of staff service 1 to 5
weekend Stayed during a weekend "yes" or "no"
breakfast Bought breakfast package "yes" or "no"
satisfaction Overall hotel satisfaction 1 to 10

Functions you may need

You do not need all of these in every exercise, but these are the main functions and operators used across the whole assignment. If you make yourself a small cheat sheet, these are good ones to include.

  • library(readxl)
  • read_excel()
  • head()
  • mean()
  • median()
  • IQR()
  • min()
  • max()
  • sd()
  • sum()
  • boxplot()

Useful operators you will probably use:

  • $
  • ==
  • >
  • >=
  • &
  • |

Exercise 1 - Import the workbook and take a first look

Import data.xlsx into an object called df, then show the dataset.

Typing either df or head(df) is completely fine here.

Load the Excel package first, import data.xlsx into an object called df, and then print df or head(df) so you can see that the import worked.

library(readxl)
df <- read_excel("data.xlsx")
head(df)

Exercise 2 - Mean satisfaction

df is already loaded for you in this exercise.

What is the mean of satisfaction?

Pick the satisfaction column and apply the function that gives the arithmetic average.

mean(df$satisfaction)

Exercise 3 - Median staff

df is already loaded for you in this exercise.

What is the median of staff?

Use the staff column and apply the function for the middle value, not the average.

median(df$staff)

Exercise 4 - IQR of cleanliness

df is already loaded for you in this exercise.

What is the Interquartile Range of cleanliness?

Use the cleanliness column and apply the function for the interquartile range, not the full range.

IQR(df$cleanliness)

Exercise 5 - Minimum satisfaction

df is already loaded for you in this exercise.

What is the minimum value of satisfaction?

Use the satisfaction column and return the smallest observed value.

min(df$satisfaction)

Exercise 6 - Maximum staff

df is already loaded for you in this exercise.

What is the maximum value of staff?

Use the staff column and return the largest observed value.

max(df$staff)

Exercise 7 - Standard deviation of cleanliness

df is already loaded for you in this exercise.

What is the standard deviation of cleanliness?

Use the cleanliness column and apply the function for standard deviation, not variance.

sd(df$cleanliness)

Exercise 8 - Weekend stays

df is already loaded for you in this exercise.

How many guests have weekend == "yes"?

Create a logical condition for weekend == "yes" and then count how many TRUE values you get.

sum(df$weekend == "yes")

Exercise 9 - Percentage of weekend stays

df is already loaded for you in this exercise.

What percentage of guests have weekend == "yes"?

You may return either a proportion from 0 to 1 or a percentage from 0 to 100.

First create the logical condition weekend == "yes". Then either take the mean of that logical condition or divide the count by the number of rows.

100 * sum(df$weekend == "yes") / nrow(df)

Exercise 10 - Proportion with breakfast package

df is already loaded for you in this exercise.

What share of guests bought the breakfast package?

You may return either a proportion from 0 to 1 or a percentage from 0 to 100.

Use the breakfast column with the condition "yes", then turn that into a share of the full dataset.

100 * sum(df$breakfast == "yes") / nrow(df)

Exercise 11 - Weekend and high staff score

df is already loaded for you in this exercise.

How many guests satisfy both of these conditions at the same time?

  • weekend == "yes"
  • staff >= 4

This is a good place to use &.

Build both conditions first, then join them with & because the guest must satisfy both at the same time.

sum(df$weekend == "yes" & df$staff >= 4)

Exercise 12 - Breakfast or low cleanliness

df is already loaded for you in this exercise.

How many guests satisfy at least one of these conditions?

  • breakfast == "yes"
  • cleanliness <= 2

This is a good place to use |.

Build both conditions first, then join them with | because a guest should be counted if at least one condition is true.

sum(df$breakfast == "yes" | df$cleanliness <= 2)

Exercise 13 - Guests with high staff rating

df is already loaded for you in this exercise.

How many guests have staff > 4?

Use the staff column, make the condition > 4, and count how many rows satisfy it.

sum(df$staff > 4)

Exercise 14 - Make a boxplot of satisfaction

df is already loaded for you in this exercise.

Create a boxplot of satisfaction.

Use a horizontal boxplot if you can. In a real exam, this is the kind of plot you would then interpret visually to discuss possible outliers.

Use the satisfaction column in boxplot(). If you can, make it horizontal so the result is easier to read.

boxplot(df$satisfaction, horizontal = TRUE)