Week 3 Practice Problems: Solutions

Author
Affiliation

Alex Kaizer

University of Colorado-Anschutz Medical Campus

This page includes the solutions to the optional practice problems for the given week. If you want to see a version without solutions please click here. Data sets, if needed, are provided on the BIOS 6618 Canvas page for students registered for the course.

This week’s extra practice exercises focus on subsetting data objects and identifying relationships between the various assumptions in power calculations.

Exercise 1

1a

Run the following code to combine a few state-related data sets that are part of R’s available data sets:

Code
states <- data.frame(state.x77, state.region, state.abb)

1b

Calculate the mean (SD) life expectancy by state region.

Solution:

Code
region_vec <- unique( states$state.region ) # create vector of unique regions in our data

region_sum <- matrix( nrow=length(region_vec), ncol=2, dimnames=list(region_vec, c('mean','sd')))

for( i in region_vec ){
  region_sum[i, 'mean'] <- mean(states[ which(states$state.region==i), 'Life.Exp'])
  region_sum[i, 'sd'] <- sd(states[ which(states$state.region==i), 'Life.Exp'])  
}

region_sum
                  mean        sd
South         69.70625 1.0221994
West          71.23462 1.3519715
Northeast     71.26444 0.7438769
North Central 71.76667 1.0367285

1c

Subset the four corner states (Utah, Colorado, Arizona, and New Mexico) by row name. Which state has the largest population? Which state has the lowest high school graduation rate?

Solution:

Code
states[c('Utah','Colorado','Arizona','New Mexico'),]
           Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Utah             1203   4022        0.6    72.90    4.5    67.3   137  82096
Colorado         2541   4884        0.7    72.06    6.8    63.9   166 103766
Arizona          2212   4530        1.8    70.55    7.8    58.1    15 113417
New Mexico       1144   3601        2.2    70.32    9.7    55.2   120 121412
           state.region state.abb
Utah               West        UT
Colorado           West        CO
Arizona            West        AZ
New Mexico         West        NM

Colorado has the largest population, whereas New Mexico has the lowest high school graduation rate.

1d

Subset states that either have an area greater than 90,000 miles\(^2\) or a percent of high school graduates below 50%. How many states meet this criteria?

Solution:

Code
states_sub1 <- states[which(states$Area>90000 | states$HS.Grad<50),]
dim( states_sub1 ) # check the number of rows and columns
[1] 23 10

23 states meet this criteria.

1e

Subset states that have an area greater than 90,000 miles\(^2\) and a percent of high school graduates below 50%. How many states meet this criteria?

Solution:

Code
states_sub2 <- states[which(states$Area>90000 & states$HS.Grad<50),]
dim( states_sub2 ) # check the number of rows and columns
[1]  1 10
Code
states_sub2
      Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Texas      12237   4188        2.2     70.9   12.2    47.4    35 262134
      state.region state.abb
Texas        South        TX

1 state (Texas) meets this criteria.

1f

Are there any states where the mean number of days with a minimum temperature below freezing is above 100, the murder rate is greater than 10 per 100,000 population, and the rate of illiteracy is below 1%?

Solution:

Code
states[ which(states$Frost>100 & states$Murder>10 & states$Illiteracy<1),]
         Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Illinois      11197   5107        0.9    70.14   10.3    52.6   127  55748
Michigan       9111   4751        0.9    70.63   11.1    52.8   125  56817
Nevada          590   5149        0.5    69.03   11.5    65.2   188 109889
          state.region state.abb
Illinois North Central        IL
Michigan North Central        MI
Nevada            West        NV

Yes, Illinois, Michigan, and Nevada all meet the criteria.

Exercise 2

This exercise focuses on evaluating the relationship of what happens as we change different parts of a power calculation for the case of the known standard deviation. You may find the website https://rpsychologist.com/d3/nhst/ helpful to visualize the different scenarios.

For each of these scenarios, assume that all parameters not discussed are fixed at some value, then identify what happens to the different quantities.

As sample size \(n\) increases:

  1. Power: increases
  2. Detectable Difference \(|\mu_0 - \mu_1|\): decreases

As the difference to be detected, \(|\mu_0 - \mu_1|\), increases:

  1. Power: increases
  2. Required Sample Size: decreases

As desired power increases:

  1. Required Sample Size: increases
  2. Detectable Difference \(|\mu_0 - \mu_1|\): increases

As \(\sigma\), the population SD, increases:

  1. Power: decreases
  2. Detectable Difference \(|\mu_0 - \mu_1|\): increases
  3. Required Sample Size: increases

As \(\alpha\), the significance level of the test, increases:

  1. Power: increases
  2. Detectable Difference \(|\mu_0 - \mu_1|\): decreases
  3. Required Sample Size: decreases