In this HTML we will implement the multiple linear regression for the Rosner data set examining systolic blood pressure based on weight (oz) and age (days) for newborns:
SBP | Weight | Age |
---|---|---|
89 | 135 | 3 |
90 | 120 | 4 |
83 | 100 | 3 |
77 | 105 | 2 |
92 | 130 | 4 |
98 | 125 | 5 |
82 | 125 | 2 |
85 | 105 | 3 |
96 | 120 | 5 |
95 | 90 | 4 |
80 | 120 | 2 |
79 | 95 | 3 |
86 | 120 | 3 |
97 | 150 | 4 |
92 | 160 | 3 |
88 | 125 | 3 |
We will first implement the multiple linear regression model using matrices, then check our result using lm
.
Y <- matrix( c(89,90,83,77,92,98,82,85,96,95,80,79,86,97,92,88), ncol=1 )
X <- matrix( c( rep(1, length(Y)),
c(135,120,100,105,130,125,125,105,120,90,120,95,120,150,160,125),
c(3,4,3,2,4,5,2,3,5,4,2,3,3,4,3,3)),
ncol=3, byrow=F)
cbind(Y,X)
## [,1] [,2] [,3] [,4]
## [1,] 89 1 135 3
## [2,] 90 1 120 4
## [3,] 83 1 100 3
## [4,] 77 1 105 2
## [5,] 92 1 130 4
## [6,] 98 1 125 5
## [7,] 82 1 125 2
## [8,] 85 1 105 3
## [9,] 96 1 120 5
## [10,] 95 1 90 4
## [11,] 80 1 120 2
## [12,] 79 1 95 3
## [13,] 86 1 120 3
## [14,] 97 1 150 4
## [15,] 92 1 160 3
## [16,] 88 1 125 3
XtX <- t(X) %*% X
XtX
## [,1] [,2] [,3]
## [1,] 16 1925 53
## [2,] 1925 236875 6405
## [3,] 53 6405 189
XtY <- t(X) %*% Y
XtY
## [,1]
## [1,] 1409
## [2,] 170350
## [3,] 4750
YtY <- t(Y) %*% Y
YtY
## [,1]
## [1,] 124751
XtX_inv <- solve(XtX)
XtX_inv
## [,1] [,2] [,3]
## [1,] 3.34152652 -0.0217335058 -0.2005174644
## [2,] -0.02173351 0.0001918187 -0.0004059419
## [3,] -0.20051746 -0.0004059419 0.0752776910
beta_vec <- XtX_inv %*% t(X) %*% Y
beta_vec
## [,1]
## [1,] 53.4501940
## [2,] 0.1255833
## [3,] 5.8877191
mse <- ( YtY - t(beta_vec) %*% t(X) %*% Y ) / ( length(Y) - 2 - 1 ) #denominator is n-p-1, where n=16, p=2 (2 different predictors)
mse
## [,1]
## [1,] 6.146297
Sigma_mat <- XtX_inv * as.numeric( mse ) #note: here we had to convert our sigma^2(Y|X) into a numeric from a 1x1 matrix object, otherwise R gives a "Error...non-conformable arrays"
Sigma_mat
## [,1] [,2] [,3]
## [1,] 20.5380142 -0.133580580 -1.23243988
## [2,] -0.1335806 0.001178975 -0.00249504
## [3,] -1.2324399 -0.002495040 0.46267904
c_vec <- matrix(c(0,1,0), nrow=1)
beta_1 <- c_vec %*% beta_vec
beta_1
## [,1]
## [1,] 0.1255833
var_beta_1 <- c_vec %*% XtX_inv %*% t(c_vec) * as.numeric(mse)
var_beta_1
## [,1]
## [1,] 0.001178975
t_stat <- beta_1 / sqrt( var_beta_1 )
t_stat
## [,1]
## [1,] 3.657459
pval <- 2 * (1-pt( t_stat, df = length(Y) - 2 - 1) ) # df = n-p-1
pval
## [,1]
## [1,] 0.002895789
We can also calculate a 95% confidence interval based on our estimated variability and tie it all together in our brief, but complete, framework:
lci <- beta_1 - qt(0.975,13)*sqrt(var_beta_1)
uci <- beta_1 + qt(0.975,13)*sqrt(var_beta_1)
c(lci, uci)
## [1] 0.05140441 0.19976212
There is a significant increase in SBP (mmHg) for a one unit increase in weight (oz) (p=0.0029). On average, for a one ounce increase in weight, SBP increases by 0.1255833 mmHg (95% CI: 0.0514044, 0.1997621), when adjusting for age.
birth <- data.frame(
sbp = c(89,90,83,77,92,98,82,85,96,95,80,79,86,97,92,88),
wgt = c(135,120,100,105,130,125,125,105,120,90,120,95,120,150,160,125),
age = c(3,4,3,2,4,5,2,3,5,4,2,3,3,4,3,3)
)
mod1 <- lm(sbp ~ wgt + age, data=birth)
summary(mod1)
##
## Call:
## lm(formula = sbp ~ wgt + age, data = birth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0438 -1.3481 -0.2395 0.9688 6.6964
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.45019 4.53189 11.794 2.57e-08 ***
## wgt 0.12558 0.03434 3.657 0.0029 **
## age 5.88772 0.68021 8.656 9.34e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.479 on 13 degrees of freedom
## Multiple R-squared: 0.8809, Adjusted R-squared: 0.8626
## F-statistic: 48.08 on 2 and 13 DF, p-value: 9.844e-07
confint(mod1)
## 2.5 % 97.5 %
## (Intercept) 43.65964398 63.2407441
## wgt 0.05140441 0.1997621
## age 4.41822526 7.3572130