Code
states <- data.frame(state.x77, state.region, state.abb)Alex Kaizer
University of Colorado-Anschutz Medical Campus
This page includes the solutions to the optional practice problems for the given week. If you want to see a version without solutions please click here. Data sets, if needed, are provided on the BIOS 6618 Canvas page for students registered for the course.
This week’s extra practice exercises focus on subsetting data objects and identifying relationships between the various assumptions in power calculations.
Run the following code to combine a few state-related data sets that are part of R’s available data sets:
Calculate the mean (SD) life expectancy by state region.
Solution:
region_vec <- unique( states$state.region ) # create vector of unique regions in our data
region_sum <- matrix( nrow=length(region_vec), ncol=2, dimnames=list(region_vec, c('mean','sd')))
for( i in region_vec ){
  region_sum[i, 'mean'] <- mean(states[ which(states$state.region==i), 'Life.Exp'])
  region_sum[i, 'sd'] <- sd(states[ which(states$state.region==i), 'Life.Exp'])  
}
region_sum                  mean        sd
South         69.70625 1.0221994
West          71.23462 1.3519715
Northeast     71.26444 0.7438769
North Central 71.76667 1.0367285Subset the four corner states (Utah, Colorado, Arizona, and New Mexico) by row name. Which state has the largest population? Which state has the lowest high school graduation rate?
Solution:
           Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Utah             1203   4022        0.6    72.90    4.5    67.3   137  82096
Colorado         2541   4884        0.7    72.06    6.8    63.9   166 103766
Arizona          2212   4530        1.8    70.55    7.8    58.1    15 113417
New Mexico       1144   3601        2.2    70.32    9.7    55.2   120 121412
           state.region state.abb
Utah               West        UT
Colorado           West        CO
Arizona            West        AZ
New Mexico         West        NMColorado has the largest population, whereas New Mexico has the lowest high school graduation rate.
Subset states that either have an area greater than 90,000 miles\(^2\) or a percent of high school graduates below 50%. How many states meet this criteria?
Solution:
[1] 23 1023 states meet this criteria.
Subset states that have an area greater than 90,000 miles\(^2\) and a percent of high school graduates below 50%. How many states meet this criteria?
Solution:
[1]  1 10      Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Texas      12237   4188        2.2     70.9   12.2    47.4    35 262134
      state.region state.abb
Texas        South        TX1 state (Texas) meets this criteria.
Are there any states where the mean number of days with a minimum temperature below freezing is above 100, the murder rate is greater than 10 per 100,000 population, and the rate of illiteracy is below 1%?
Solution:
         Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Illinois      11197   5107        0.9    70.14   10.3    52.6   127  55748
Michigan       9111   4751        0.9    70.63   11.1    52.8   125  56817
Nevada          590   5149        0.5    69.03   11.5    65.2   188 109889
          state.region state.abb
Illinois North Central        IL
Michigan North Central        MI
Nevada            West        NVYes, Illinois, Michigan, and Nevada all meet the criteria.
This exercise focuses on evaluating the relationship of what happens as we change different parts of a power calculation for the case of the known standard deviation. You may find the website https://rpsychologist.com/d3/nhst/ helpful to visualize the different scenarios.
For each of these scenarios, assume that all parameters not discussed are fixed at some value, then identify what happens to the different quantities.
As sample size \(n\) increases:
As the difference to be detected, \(|\mu_0 - \mu_1|\), increases:
As desired power increases:
As \(\sigma\), the population SD, increases:
As \(\alpha\), the significance level of the test, increases:
---
title: "Week 3 Practice Problems: Solutions"
author: 
  name: Alex Kaizer
  roles: "Instructor"
  affiliation: University of Colorado-Anschutz Medical Campus
toc: true
toc_float: true
toc-location: left
format:
  html:
    code-fold: show
    code-overflow: wrap 
    code-tools: true
---
```{r, echo=F, message=F, warning=F}
library(kableExtra)
library(dplyr)
```
This page includes the solutions to the optional practice problems for the given week. If you want to see a version [without solutions please click here](/labs/prac3/index.qmd). Data sets, if needed, are provided on the BIOS 6618 Canvas page for students registered for the course.
This week's extra practice exercises focus on subsetting data objects and identifying relationships between the various assumptions in power calculations. 
# Exercise 1
## 1a 
Run the following code to combine a few state-related data sets that are part of R's available data sets:
```{r}
states <- data.frame(state.x77, state.region, state.abb)
```
## 1b 
Calculate the mean (SD) life expectancy by state region.
**Solution:**
```{r}
region_vec <- unique( states$state.region ) # create vector of unique regions in our data
region_sum <- matrix( nrow=length(region_vec), ncol=2, dimnames=list(region_vec, c('mean','sd')))
for( i in region_vec ){
  region_sum[i, 'mean'] <- mean(states[ which(states$state.region==i), 'Life.Exp'])
  region_sum[i, 'sd'] <- sd(states[ which(states$state.region==i), 'Life.Exp'])  
}
region_sum
```
## 1c
Subset the four corner states (Utah, Colorado, Arizona, and New Mexico) **by row name**. Which state has the largest population? Which state has the lowest high school graduation rate?
**Solution:**
```{r}
states[c('Utah','Colorado','Arizona','New Mexico'),]
```
Colorado has the largest population, whereas New Mexico has the lowest high school graduation rate.
## 1d
Subset states that either have an area greater than 90,000 miles$^2$ *or* a percent of high school graduates below 50\%. How many states meet this criteria?
**Solution:**
```{r}
states_sub1 <- states[which(states$Area>90000 | states$HS.Grad<50),]
dim( states_sub1 ) # check the number of rows and columns
```
23 states meet this criteria.
## 1e
Subset states that have an area greater than 90,000 miles$^2$ *and* a percent of high school graduates below 50\%. How many states meet this criteria?
**Solution:**
```{r}
states_sub2 <- states[which(states$Area>90000 & states$HS.Grad<50),]
dim( states_sub2 ) # check the number of rows and columns
states_sub2
```
1 state (Texas) meets this criteria.
## 1f
Are there any states where the mean number of days with a minimum temperature below freezing is above 100, the murder rate is greater than 10 per 100,000 population, and the rate of illiteracy is below 1\%?
**Solution:**
```{r}
states[ which(states$Frost>100 & states$Murder>10 & states$Illiteracy<1),]
```
Yes, Illinois, Michigan, and Nevada all meet the criteria.
# Exercise 2
This exercise focuses on evaluating the relationship of what happens as we change different parts of a power calculation for the case of the **known standard deviation**. You may find the website <https://rpsychologist.com/d3/nhst/> helpful to visualize the different scenarios.
For each of these scenarios, assume that all parameters not discussed are fixed at some value, then identify what happens to the different quantities.
**As sample size $n$ increases:**
1. Power: *increases*
2. Detectable Difference $|\mu_0 - \mu_1|$: *decreases*
**As the difference to be detected, $|\mu_0 - \mu_1|$, increases:**
1. Power: *increases*
2. Required Sample Size: *decreases*
**As desired power increases:**
1. Required Sample Size: *increases*
2. Detectable Difference $|\mu_0 - \mu_1|$: *increases*
**As $\sigma$, the population SD, increases:**
1. Power: *decreases*
2. Detectable Difference $|\mu_0 - \mu_1|$: *increases*
3. Required Sample Size: *increases*
**As $\alpha$, the significance level of the test, increases:**
1. Power: *increases*
2. Detectable Difference $|\mu_0 - \mu_1|$: *decreases*
3. Required Sample Size: *decreases*