Permutation Test p-values

Author
Affiliation

Alex Kaizer

University of Colorado-Anschutz Medical Campus

This page is part of the University of Colorado-Anschutz Medical Campus’ BIOS 6618 Recitation collection. To view other questions, you can view the BIOS 6618 Recitation collection page or use the search bar to look for keywords.

Permutation Test p-values

Let’s revisit the example from lab on Tuesday with a comparison of a vaccine for Celiac disease to more explicitly walk through how we calculate the p-values and what happens if something goes wrong. We assumed under the null scenario that the placebo group’s tissue transglutaminase IgA antibody (tTG-IgA) distribution was \(Y_P \sim \text{Gamma}(\text{shape}=10,\text{scale}=3)\) and that the treatment group’s tTG-IgA was \(Y_T \sim \text{Gamma}(\text{shape}=10,\text{scale}=\sqrt{18})\).

Code
set.seed(612)
placebo <- rgamma(n=50, shape=10, scale=3)
vaccine <- rgamma(n=50, shape=5, scale=sqrt(18))

var(placebo) # calculate the sample variances
[1] 101.8625
Code
var(vaccine) # calculate the sample variances
[1] 93.42304
Code
obs_ratio <- var(placebo)/var(vaccine) # calculate the ratio of the variances
obs_ratio
[1] 1.090336
Code
B <- 10^4 - 1 #set number of times to complete permutation sampling
result <- numeric(B)

nP <- length(placebo)

obs_ratio <- var(placebo)/var(vaccine) # calculate the ratio of the variances

set.seed(612) #set seed for reproducibility
pool_dat <- c(placebo, vaccine)

for(j in 1:B){
    index <- sample(x=1:length(pool_dat), size=nP, replace=F)
    placebo_permute <- pool_dat[index]
    vaccine_permute <- pool_dat[-index]
    result[j] <- var(placebo_permute) / var(vaccine_permute)
}

# Histogram
hist( result, xlab='Var(Placebo) / Var(Vaccine)', 
      main='Permutation Distribution for Ratio of Variances')
abline( v=obs_ratio, lty=2, col='blue', lwd=2)
abline( v=1/obs_ratio, lty=4, col='orangered2', lwd=2)

legend('topright', lty=c(2,4), lwd=c(2,2), col=c('blue','orangered2'), legend=c('Observed Ratio','Inverse of Observed Ratio'), bty='n')

We see in the histogram above that we have both the observed ratio (obs_ratio) and the inverse of our observed ratio (1/obs_ratio) indicated by the vertical lines. From our data we can both calculation the proportion of our permutation sample that is \(\leq\) or \(\geq\), and how that corresponds to our permutation test p-value.

First, let’s focus on the calculation of our observed ratio. We know that for a permutation test p-value we need to calculate the proportion that is as or more extreme than what we observed. Visually, this would represent:

Code
# Calculate histogram, but do not draw it
my_hist <- hist(result , plot=F)
 
# Color vector
my_color= ifelse(my_hist$breaks > obs_ratio, 'lightblue' , ifelse (my_hist$breaks < (1/obs_ratio)-0.2, 'orange','white' ))
 
# Final plot
plot(my_hist, col=my_color, border=T, xlab='Var(Placebo) / Var(Vaccine)', main='Permutation Distribution for Ratio of Variances')
rect(xleft=0.8, ybottom=0, xright=1/obs_ratio, ytop=2461, col='orange')
rect(xleft=obs_ratio, ybottom=0, xright=1.2, ytop=2042, col='lightblue')
abline( v=obs_ratio, lty=2, col='blue', lwd=2)
abline( v=1/obs_ratio, lty=4, col='orangered2', lwd=2)
legend('topright', lty=c(2,4), lwd=c(2,2), col=c('blue','orangered2'), legend=c('Observed Ratio','Inverse of Observed Ratio'), bty='n')
text(x=2, y=1000, 'result >= obs_ratio', col='blue')
text(x=0.3, y=1000, 'result <= 1/obs_ratio', col='orangered2', srt=60)

Numerically, these values are:

Code
# The correct calculations
((sum(result >= obs_ratio) + 1)/(B+1))
[1] 0.3992
Code
((sum(result <= (1/obs_ratio)) + 1)/(B+1))
[1] 0.4009

For the two-sided p-value we then double our largest value to use for our conclusion:

Code
2 * max(((sum(result >= obs_ratio) + 1)/(B+1)), ((sum(result <= (1/obs_ratio)) + 1)/(B+1)))
[1] 0.8018

If we happen to forget to check the order of our problem and we flip the direction of our inequalities we arrive at:

Code
# The incorrect calculations
((sum(result <= obs_ratio) + 1)/(B+1))
[1] 0.6009
Code
((sum(result >= (1/obs_ratio)) + 1)/(B+1))
[1] 0.5992
Code
2 * max( ((sum(result <= obs_ratio) + 1)/(B+1)), ((sum(result >= (1/obs_ratio)) + 1)/(B+1)))
[1] 1.2018

We see in this case we have a nonsensical p-value that is larger than 1!! This helps us realize we may need to double check our observed statistic and how it relates to our distribution.

One-Sided p-value Calculation

In the case of a one-sided p-value the approach is a little more straight forward, but we still need to consider the context of our problem!!

Code
# Calculate histogram, but do not draw it
my_hist <- hist(result , plot=F)
 
# Color vector
my_color= ifelse(my_hist$breaks > obs_ratio, 'lightblue' , 'orange' )
 
# Final plot
plot(my_hist, col=my_color, border=T, xlab='Var(Placebo) / Var(Vaccine)', main='Permutation Distribution for Ratio of Variances')
rect(xleft=obs_ratio, ybottom=0, xright=1.2, ytop=2042, col='lightblue')
abline( v=obs_ratio, lty=2, col='blue', lwd=2)
text(x=2, y=1000, 'result >= obs_ratio', col='blue')
text(x=0.3, y=1000, 'result <= obs_ratio', col='orangered2', srt=60)

Here our interpretation then depends on our a priori specified null hypothesis (i.e., that we expect the ratio to be larger than 1 or small than 1 based on our context).

Numerically, these values are:

Code
# The correct calculations
((sum(result >= obs_ratio) + 1)/(B+1)) # H0 that the expected ratio > 1
[1] 0.3992
Code
((sum(result <= obs_ratio) + 1)/(B+1)) # H0 that the expected ratio < 1
[1] 0.6009