Week 3 Practice Problems: Solutions


Alex Kaizer

University of Colorado-Anschutz Medical Campus

This week's extra practice exercises focus on subsetting data objects and identifying relationships between the various assumptions in power calculations.

This week’s extra practice exercises focus on subsetting data objects and identifying relationships between the various assumptions in power calculations.

Exercise 1


Run the following code to combine a few state-related data sets that are part of R’s available data sets:

states <- data.frame(state.x77, state.region, state.abb)


Calculate the mean (SD) life expectancy by state region.


region_vec <- unique( states$state.region ) # create vector of unique regions in our data

region_sum <- matrix( nrow=length(region_vec), ncol=2, dimnames=list(region_vec, c('mean','sd')))

for( i in region_vec ){
  region_sum[i, 'mean'] <- mean(states[ which(states$state.region==i), 'Life.Exp'])
  region_sum[i, 'sd'] <- sd(states[ which(states$state.region==i), 'Life.Exp'])  

                  mean        sd
South         69.70625 1.0221994
West          71.23462 1.3519715
Northeast     71.26444 0.7438769
North Central 71.76667 1.0367285


Subset the four corner states (Utah, Colorado, Arizona, and New Mexico) by row name. Which state has the largest population? Which state has the lowest high school graduation rate?


states[c('Utah','Colorado','Arizona','New Mexico'),]
           Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Utah             1203   4022        0.6    72.90    4.5    67.3   137  82096
Colorado         2541   4884        0.7    72.06    6.8    63.9   166 103766
Arizona          2212   4530        1.8    70.55    7.8    58.1    15 113417
New Mexico       1144   3601        2.2    70.32    9.7    55.2   120 121412
           state.region state.abb
Utah               West        UT
Colorado           West        CO
Arizona            West        AZ
New Mexico         West        NM

Colorado has the largest population, whereas New Mexico has the lowest high school graduation rate.


Subset states that either have an area greater than 90,000 miles\(^2\) or a percent of high school graduates below 50%. How many states meet this criteria?


states_sub1 <- states[which(states$Area>90000 | states$HS.Grad<50),]
dim( states_sub1 ) # check the number of rows and columns
[1] 23 10

23 states meet this criteria.


Subset states that have an area greater than 90,000 miles\(^2\) and a percent of high school graduates below 50%. How many states meet this criteria?


states_sub2 <- states[which(states$Area>90000 & states$HS.Grad<50),]
dim( states_sub2 ) # check the number of rows and columns
[1]  1 10
      Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Texas      12237   4188        2.2     70.9   12.2    47.4    35 262134
      state.region state.abb
Texas        South        TX

1 state (Texas) meets this criteria.


Are there any states where the mean number of days with a minimum temperature below freezing is above 100, the murder rate is greater than 10 per 100,000 population, and the rate of illiteracy is below 1%?


states[ which(states$Frost>100 & states$Murder>10 & states$Illiteracy<1),]
         Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Illinois      11197   5107        0.9    70.14   10.3    52.6   127  55748
Michigan       9111   4751        0.9    70.63   11.1    52.8   125  56817
Nevada          590   5149        0.5    69.03   11.5    65.2   188 109889
          state.region state.abb
Illinois North Central        IL
Michigan North Central        MI
Nevada            West        NV

Yes, Illinois, Michigan, and Nevada all meet the criteria.

Exercise 2

This exercise focuses on evaluating the relationship of what happens as we change different parts of a power calculation for the case of the known standard deviation. You may find the website https://rpsychologist.com/d3/nhst/ helpful to visualize the different scenarios.

For each of these scenarios, assume that all parameters not discussed are fixed at some value, then identify what happens to the different quantities.

As sample size \(n\) increases:

  1. Power: increases
  2. Detectable Difference \(|\mu_0 - \mu_1|\): decreases

As the difference to be detected, \(|\mu_0 - \mu_1|\), increases:

  1. Power: increases
  2. Required Sample Size: decreases

As desired power increases:

  1. Required Sample Size: increases
  2. Detectable Difference \(|\mu_0 - \mu_1|\): increases

As \(\sigma\), the population SD, increases:

  1. Power: decreases
  2. Detectable Difference \(|\mu_0 - \mu_1|\): increases
  3. Required Sample Size: increases

As \(\alpha\), the significance level of the test, increases:

  1. Power: increases
  2. Detectable Difference \(|\mu_0 - \mu_1|\): decreases
  3. Required Sample Size: decreases