Generalized Linear Models and Their Connection to Ordinary Least Squares Linear Regression

Author

Affiliation

Alex Kaizer

University of Colorado-Anschutz Medical Campus

This page is part of the University of Colorado-Anschutz Medical Campus’ BIOS 6618 Recitation collection. To view other questions, you can view the BIOS 6618 Recitation collection page or use the search bar to look for keywords.

Generalized Linear Models

It turns out much of what we have done this semester with linear regression can be directly related to a larger family of models known as generalized linear models. These include a family and link function.

Families refer to the type of outcome:

Gaussian (i.e., normal) for continuous outcomes
Gamma for continuous outcomes restricted to positive values (also often skewed)
Binomial for binary outcomes
Poisson for count outcomes
Quasi-models (binomial, Poisson) that relax the dispersion parameter of our exponential families (i.e., are a little more flexible)

The link function connects how we estimate our outcome from our predictors and can vary by family. For Gaussian models there are three options for underlying models:

Identity: \(Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2\)
Inverse: \(\frac{1}{Y} = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \implies Y = \frac{1}{\beta_0 + \beta_1 X_1 + \beta_2 X_2}\)
Log: \(\log(Y) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \implies Y = \exp(\beta_0 + \beta_1 X_1 + \beta_2 X_2) = \exp(\beta_0) \times \exp(\beta_1 X_1) \times \exp(\beta_2 X_2)\) (i.e., our multiplicative interpretation since the relationship between Y and \(\mathbf{X}\) is no longer additive)

Let’s look at some examples of these and how they sometimes connect to other topics from the semester.

Gaussian, Identity Link

Let’s see some examples of how our models change with different approaches. We’ll simulate a simple case of \(n=50\) with a few predictors:

Code

set.seed(1207) # set seed for reproducibility

n <- 50
x1 <- rnorm(n=n, mean=10, sd=2)
x2 <- rexp(n=n, rate=0.5)
x3 <- rbinom(n=n, size=1, prob=0.6)
error <- rnorm(n=n, mean=0, sd=3)
y <- 150 + -2*x1 + 4*x2 - 5*x3 + error

Code

# Gaussian, Identity link
m_identity <- glm(y ~ x1 + x2 + x3, family=gaussian(link = "identity"))
summary(m_identity)


Call:
glm(formula = y ~ x1 + x2 + x3, family = gaussian(link = "identity"))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 150.4104     2.0191  74.492  < 2e-16 ***
x1           -2.0104     0.2039  -9.861 6.34e-13 ***
x2            3.9184     0.2023  19.374  < 2e-16 ***
x3           -5.4021     0.8358  -6.463 5.84e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 8.66977)

    Null deviance: 4037.74  on 49  degrees of freedom
Residual deviance:  398.81  on 46  degrees of freedom
AIC: 255.72

Number of Fisher Scoring iterations: 2

This is our tried and true model, our linear regression with an additive interpretation. In this case, our simulation values are pretty darn close to what we set!

Gaussian, Inverse Link

Code

# Gaussian, Inverse link
m_inverse <- glm(y ~ x1 + x2 + x3, family=gaussian(link = "inverse"))
summary(m_inverse)


Call:
glm(formula = y ~ x1 + x2 + x3, family = gaussian(link = "inverse"))

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.600e-03  1.062e-04  62.120  < 2e-16 ***
x1           1.064e-04  1.107e-05   9.613 1.41e-12 ***
x2          -1.986e-04  1.014e-05 -19.592  < 2e-16 ***
x3           2.731e-04  4.433e-05   6.161 1.66e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 8.993352)

    Null deviance: 4037.74  on 49  degrees of freedom
Residual deviance:  413.69  on 46  degrees of freedom
AIC: 257.55

Number of Fisher Scoring iterations: 4

These estimates don’t have as nice an interpretation and it is hard to tell if they seem “right” given our simulation setting.

However, if we change our values of Y to be estimated as the inverse of our \(\mathbf{X}\) we will see a match of our simulation:

Code

y2 <- (1 / (150 + -2*x1 + 4*x2 - 5*x3 + error)) # inverse of 1/XB
m_inverse2 <- glm(y2 ~ x1 + x2 + x3, family=gaussian(link = "inverse"))
summary(m_inverse2)


Call:
glm(formula = y2 ~ x1 + x2 + x3, family = gaussian(link = "inverse"))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 150.8010     2.0409   73.89  < 2e-16 ***
x1           -2.0499     0.2001  -10.25 1.87e-13 ***
x2            3.9120     0.2201   17.77  < 2e-16 ***
x3           -5.4886     0.8393   -6.54 4.48e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 2.496697e-08)

    Null deviance: 1.0956e-05  on 49  degrees of freedom
Residual deviance: 1.1485e-06  on 46  degrees of freedom
AIC: -727.56

Number of Fisher Scoring iterations: 3

This suggests that the relationship between Y and \(\mathbf{X}\) is an inverse link (although note it isn’t the exact same \(\hat{\beta}\)’s from the identity link). However, it still doesn’t have a nice interpretation of the \(\hat{\beta}\)’s.

Gaussian, Log Link

The idea behind this model is perhaps most similar to what we saw earlier this semester but with some different assumptions:

Code

# Gaussian, Log link
m_log <- glm(y ~ x1 + x2 + x3, family=gaussian(link = "log"))
summary(m_log)


Call:
glm(formula = y ~ x1 + x2 + x3, family = gaussian(link = "log"))

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.017036   0.014648 342.510  < 2e-16 ***
x1          -0.014655   0.001502  -9.758 8.82e-13 ***
x2           0.027948   0.001427  19.585  < 2e-16 ***
x3          -0.038556   0.006085  -6.336 9.05e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 8.767716)

    Null deviance: 4037.74  on 49  degrees of freedom
Residual deviance:  403.31  on 46  degrees of freedom
AIC: 256.28

Number of Fisher Scoring iterations: 3

Let’s compare it to a Gaussian family with the identity link but with \(\log(Y)\):

Code

# Gaussian, Identity Link with log(Y)
m_identity_lnY <- glm(log(y) ~ x1 + x2 + x3, family=gaussian(link = "identity"))
summary(m_identity_lnY)


Call:
glm(formula = log(y) ~ x1 + x2 + x3, family = gaussian(link = "identity"))

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.019159   0.014829 338.465  < 2e-16 ***
x1          -0.014885   0.001497  -9.941 4.91e-13 ***
x2           0.028017   0.001485  18.862  < 2e-16 ***
x3          -0.039117   0.006139  -6.372 8.00e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.0004676353)

    Null deviance: 0.209303  on 49  degrees of freedom
Residual deviance: 0.021511  on 46  degrees of freedom
AIC: -235.67

Number of Fisher Scoring iterations: 2

In this case, our predictors are not identical (and, in fact, they happen to be quite similar). BUT, it turns out these two models are making very different assumptions:

The log link transforms the model itself (i.e., we have a log-likelihood of \(l(\mu,\sigma^2;y) = \sum_{i=1}^{n} \left\{\frac{y_i \exp(\mathbf{X}'\boldsymbol{\beta}) - (\exp(\mathbf{X}'\boldsymbol{\beta}))^2 / 2}{\sigma^2} - \frac{y_i^2}{2 \sigma^2} - \frac{1}{2}\log(2\pi\sigma^2) \right\}\)). The Gaussian errors are still on the natural scale, so the error variance is constant for all mean values of \(Y\).
The \(\log(Y)\) with identity link transforms the data itself. Here, if we retransform \(\log(Y)\) back to \(Y\) the variance will change with the mean.

Why might we care? It turns out this removes the retransformation problem where \(E[\log(Y)] = \mathbf{X}'\boldsymbol{\beta}\) but \(E(\log(Y)|X) \neq \log(E(Y|X))\)