Week 14 Practice Problems: Solutions

Author

Affiliation

Alex Kaizer

University of Colorado-Anschutz Medical Campus

This page includes the solutions to the optional practice problems for the given week. If you want to see a version without solutions please click here. Data sets, if needed, are provided on the BIOS 6618 Canvas page for students registered for the course.

This week’s extra practice exercises are focusing on implementing and interpreting a linear regression model using Bayesian approaches. We leverage the same data set from the MLR practice problems to help compare results with our frequentist approaches.

Dataset Background

The following code can load the Licorice_Gargle.csv file into R leveraging the here package:

Code

dat_all <- read.csv('../../.data/Licorice_Gargle.csv')

# remove records with missing data for the outcome or any predictors:
complete_case_vec <- complete.cases(dat_all[,c('pacu30min_throatPain','preOp_gender','preOp_age','treat')]) # creates a vector of TRUE or FALSE for each row if the cases are complete (i.e., no NA values)
dat <- dat_all[complete_case_vec,]

The dataset represents a randomized control trial of 236 adult patients undergoing elective thoracic surgery requiring a double-lumen endotracheal tube comparing if licorice vs. sugar-water gargle prevents postoperative sore throat. For our exercises below we will focus on the following variables:

pacu30min_throatPain: sore throat pain score at rest 30 minutes after arriving in the post-anesthesia care unit (PACU) measured on an 11 point Likert scale (0=no pain to 10=worst pain)
preOp_gender: an indicator variable for gender (0=Male, 1=Female)
preOp_age: age (in years)
treat: the randomized treatment in the trial where 0=sugar 5g and 1=licorice 0.5g

Exercise 1: Multiple Linear Regression Example

1a: Fitting the Model (Frequentist)

Fit the frequentist multiple linear regression model for the outcome of throat pain (i.e., dependent variable) and independent variables for ASA score, gender, age, and treatment status. Print the summary table output for reference in the following questions.

Solution:

Our (frequentist) linear regression model is

Code

# Fit with glm function
glm_full <- glm(pacu30min_throatPain ~ treat + preOp_gender + preOp_age, data=dat)
summary(glm_full)


Call:
glm(formula = pacu30min_throatPain ~ treat + preOp_gender + preOp_age, 
    data = dat)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.229559   0.317234   3.876 0.000139 ***
treat        -0.744559   0.156179  -4.767 3.32e-06 ***
preOp_gender -0.259238   0.159345  -1.627 0.105134    
preOp_age    -0.001819   0.005051  -0.360 0.719128    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.415709)

    Null deviance: 361.14  on 232  degrees of freedom
Residual deviance: 324.20  on 229  degrees of freedom
AIC: 748.19

Number of Fisher Scoring iterations: 2

The 95% confidence intervals (to compare with credible intervals later) are:

Code

# Print confidence intervals to use for later responses
confint(glm_full)

                   2.5 %       97.5 %
(Intercept)   0.60779299  1.851325830
treat        -1.05066508 -0.438452912
preOp_gender -0.57154892  0.053072117
preOp_age    -0.01171743  0.008080349

1b: Fitting the Model (Bayesian)

Fit the Bayesian multiple linear regression model for the outcome of throat pain (i.e., dependent variable) and independent variables for ASA score, gender, age, and treatment status. Assume priors of \(\beta_i \sim N(\mu=0,\sigma=100)\) for all beta coefficients and use the default prior for \(\sigma\) in your chosen package/software. Compare your results to the frequentist linear regression in 1a. Provide an interpretation of the slope for the treatment effect and its 95% credible interval.

Solution:

We will use the brms package for our Bayesian implementation:

Code

library(brms)

set.seed(515) # set seed for reproducibility
bayes1 <- brm( pacu30min_throatPain ~ treat + preOp_gender + preOp_age, 
               data = dat,
               family = gaussian(),
               prior = c(set_prior("normal(0,10000)", class='b'),
                         set_prior("normal(0,10000)", class='Intercept')) )

Code

library(brms) # load package
summary(bayes1)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: pacu30min_throatPain ~ treat + preOp_gender + preOp_age 
   Data: dat (Number of observations: 233) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept        1.24      0.32     0.61     1.84 1.00     5777     3483
treat           -0.75      0.16    -1.05    -0.44 1.00     5223     3217
preOp_gender    -0.26      0.16    -0.58     0.05 1.00     5248     3078
preOp_age       -0.00      0.01    -0.01     0.01 1.00     5645     3494

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.19      0.06     1.09     1.31 1.00     4529     3210

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Code

posterior_summary(bayes1) # get results to place into table, 95% CrI specifically

                    Estimate   Est.Error          Q2.5         Q97.5
b_Intercept     1.235084e+00 0.318260922    0.60580350    1.83967321
b_treat        -7.455838e-01 0.156281746   -1.04588405   -0.44296585
b_preOp_gender -2.568938e-01 0.158918083   -0.57933482    0.05474165
b_preOp_age    -1.919861e-03 0.005008127   -0.01145995    0.00804481
sigma           1.194846e+00 0.056497367    1.09124113    1.31046288
Intercept       6.482126e-01 0.079092495    0.49225522    0.80525722
lprior         -4.188818e+01 0.013426863  -41.91641877  -41.86429702
lp__           -4.133291e+02 1.557275698 -417.19347046 -411.25006983

Code

prior_summary(bayes1) # check priors

                prior     class         coef group resp dpar nlpar lb ub
      normal(0,10000)         b                                         
      normal(0,10000)         b    preOp_age                            
      normal(0,10000)         b preOp_gender                            
      normal(0,10000)         b        treat                            
      normal(0,10000) Intercept                                         
 student_t(3, 0, 2.5)     sigma                                     0   
       source
         user
 (vectorized)
 (vectorized)
 (vectorized)
         user
      default

For those in the treatment group with licorice gargle, throat pain at 30 minutes in the PACU is 0.75 points lower (95% CrI: 0.61, 1.84) on average than those in the control group with sugar water after adjusting for age and gender.

In general, our beta coefficients and standard errors are quite similiar. The 95% confidence intervals and 95% credible intervals are also fairly similar. This suggests that using “non-informative” priors may result in similar findings to those of the standard frequentist analysis, but we would want to consider other priors that may be meaningful as well. We can also note that our Rhat estimates are all 1.0, suggesting good convergence for our MCMC.

1c: MCMC Diagnostic Plots

Create the density and trace plots for your intercept and slope parameters from 1b. Briefly describe them and note any potential issues. Note, it is okay if additional parameters are included in your plots, just focus on the beta coefficients for this problem.

Solution:

Code

plot(bayes1)

We see that the posterior distribution for each beta coefficient is approximately normal, although the tails may be fatter with some non-normal deviation. The trace plots show mixing across the sample space from all chains, suggesting adequate information for our posterior.

1d: More Informative Priors

Let’s assume that we a priori believed that the treatment should result in a 3 point reduction with a standard deviation of 0.25 (i.e., \(\beta_{treat} \sim N(\mu=-3,\sigma=0.25)\). Now fit the Bayesian model with this informative prior for treatment group but leaving all other priors unchanged from 1c. How do the beta coefficients change? Briefly discuss the potential impact of sample size on the informativeness of the prior.

Solution:

Code

set.seed(515) # set seed for reproducibility
bayes2 <- brm( pacu30min_throatPain ~ treat + preOp_gender + preOp_age, 
               data = dat,
               family = gaussian(),
               prior = c(set_prior("normal(-3,0.0625)", coef='treat'),
                         set_prior("normal(0,10000)", class='b'),
                         set_prior("normal(0,10000)", class='Intercept')) )

Code

summary(bayes2)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: pacu30min_throatPain ~ treat + preOp_gender + preOp_age 
   Data: dat (Number of observations: 233) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept        2.40      0.41     1.62     3.20 1.00     5189     3414
treat           -2.81      0.06    -2.93    -2.69 1.00     4308     3044
preOp_gender    -0.17      0.21    -0.59     0.23 1.00     4853     3095
preOp_age       -0.00      0.01    -0.02     0.01 1.00     5285     3276

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.59      0.08     1.45     1.74 1.00     4021     3336

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Code

posterior_summary(bayes2) # get results to place into table, 95% CrI specifically

                    Estimate  Est.Error          Q2.5         Q97.5
b_Intercept     2.404429e+00 0.40697609    1.62054091  3.202742e+00
b_treat        -2.811520e+00 0.06188197   -2.93446188 -2.690691e+00
b_preOp_gender -1.745726e-01 0.20811488   -0.58916498  2.295389e-01
b_preOp_age    -4.780098e-03 0.00673394   -0.01792011  8.209132e-03
sigma           1.585061e+00 0.07569206    1.44556915  1.742159e+00
Intercept       6.492187e-01 0.10306522    0.44754935  8.461912e-01
lprior         -3.504734e+01 3.04133539  -42.25318392 -3.054857e+01
lp__           -4.718568e+02 1.55714624 -475.61981036 -4.697922e+02

Code

prior_summary(bayes2) # check priors

                prior     class         coef group resp dpar nlpar lb ub
      normal(0,10000)         b                                         
      normal(0,10000)         b    preOp_age                            
      normal(0,10000)         b preOp_gender                            
    normal(-3,0.0625)         b        treat                            
      normal(0,10000) Intercept                                         
 student_t(3, 0, 2.5)     sigma                                     0   
       source
         user
 (vectorized)
 (vectorized)
         user
         user
      default

The most meaningful changes in our posterior inference are for the intercept and the treatment group beta coefficient. For example, our treatment estimate is now \(\hat{\beta}_{treat}=-2.81\) versus the original \(\hat{\beta}_{treat}=-0.75\) when we used a “non-informative” prior. This suggests our posterior is more reflective of our prior belief than the observed data.

This shift in the treatment group estimate then led to our intercept estimate of \(\hat{\beta}_{0}=2.40\) versus the estimate of \(\hat{\beta}_{0}=1.24\) with the “non-informative” prior. We see that there are some changes to gender and age, but not as large.

Our sample size is 233 for this data set. In an analysis with a larger sample size, we may expect this “informative” prior to be less informative. However, it currently is providing more influence to our posterior estimate of treatment group than the observed likelihood.

1e: Posterior Probability Calculations

Calculate the posterior probabilities that \(P(\hat{\beta}_{treat} \leq 0)\), \(P(\hat{\beta}_{treat} > 0)\), and \(P(\hat{\beta}_{treat} \leq -1)\) for both 1b and 1d. Summarize your results in a table and briefly compare model results.

Solution:

Code

# extract posterior samples for calculations
post1 <- as_draws_df(bayes1)
post2 <- as_draws_df(bayes2)

# P(betahat_treat <= 0)
pp1_lte0 <- mean( post1[,'b_treat'] <= 0 )
pp2_lte0 <- mean( post2[,'b_treat'] <= 0 )

# P(betahat_treat > 0)
pp1_gt0 <- mean( post1[,'b_treat'] > 0 )
pp2_gt0 <- mean( post2[,'b_treat'] > 0 )

# P(betahat_treat <= -1)
pp1_lten1 <- mean( post1[,'b_treat'] <= -1 )
pp2_lten1 <- mean( post2[,'b_treat'] <= -1 )

# make matrix for table of results, round to 4 places
res_mat <- matrix( round( c(pp1_lte0, pp1_gt0, pp1_lten1, 
                     pp2_lte0, pp2_gt0, pp2_lten1), 4), 
                   nrow=2, byrow=T,
                   dimnames = list( c('Non-Informative Prior (1b)','Informative Prior (1d)') ,
                                    c('<=0','>0','<=-1')) )

# print table
res_mat

                           <=0 >0   <=-1
Non-Informative Prior (1b)   1  0 0.0508
Informative Prior (1d)       1  0 1.0000

In this example, we see both the non-informative and informative priors agree with respect to posterior probabilities estimated for 0 (i.e., 100% for being less than or equal to 0 and 0% for being greater than 0).

However, we see very different results for \(P(\hat{\beta}_{treat} \leq -1\)) where the non-informative prior’s posterior probability is only 5.08% (i.e., very unlikely), whereas the informative prior’s posterior probability is 100% (i.e., definitely less than -1). Given that we know the informative prior is very strong and disagrees with our likelihood, it seems more likely that the non-informative prior is appropriately summarizing our results based on the observed data.

In practice, we want to consider multiple priors and present all results while being transparent to their limitations. For instance, we could present our non-informative prior as a potentially more statistically and scientifically conservative priors, but then also present the informative prior and make it clear that this is very strong due to its larger than observed mean difference and small standard deviation. Then, a reader can use this information to decide amongst the results themselves.