Hierarchical Linear Models in Stan

The Big One

My favourite modeling method is by far hierarchical modeling which allows for population and group effects. My personal intuition is that these more accurately reflect reality on many levels (outside of the desirable mathematical properties (e.g. skrinkage)).

Generative Model

So in the hierarchical world we are modeling different potential effects at different levels:

\[y_j \sim N(\theta_j, \sigma_j)\] Where:

\[\theta_j = \mu + \tau*\eta_j\]

\[\eta \sim n(0,1)\]

Which accounts for the error at the school level, the true population mean, \(\mu\), population standard deviation \(\tau\) and the school level error \(\eta\) which is given a normal(0,1) prior.

Fake Data

The classic example of a Hierarchical Linear Model is of course the eight school problem. We have eight different school, with estimated treated effects and associated standard deviations for the treatment for that given school.

library(dplyr)
(schools <- tribble(
  ~"school", ~"estimate", ~"sd",
  "A", 28, 15,
  "B", 8, 10,
  "C", -3, 16,
  "D", 7, 11,
  "E", -1, 9, 
  "F", 1, 11,
  "G", 18, 10,
  "H", 12, 18
))

Now stepping into the stan code we can write

//hierarchical linear model

data{
  int<lower=0> J;          //number of schools
  real y[J];              //treatment effect for each school
  real<lower=0>sigma[J]; //s.e. of effects
}

parameters{
  real mu; //population mean
  real<lower=0> tau; //population sd
  vector[J] eta; //school-level error
}

transformed parameters{
  vector[J] theta;  //school effect
  theta = mu + tau*eta;
}

model{
  eta ~ normal(0,1);
  y~normal(theta, sigma);
}

We can now load our friend rstan and compile the model:

library(rstan)

hlm_model <- stan_model("stan_hlm.stan")

We prep our data to be fit:

data <- list(J = nrow(schools), 
             y = schools$estimate, 
             sigma = schools$sd)

fit_hlm <- sampling(hlm_model, data, chains = 2, iter = 2000, refresh = 0)

## Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

## Warning: Examine the pairs() plot to diagnose sampling problems

print(fit_hlm, pars = "theta", probs = c(0.025, 0.5, 0.975))

## Inference for Stan model: stan_hlm.
## 2 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=2000.
## 
##          mean se_mean  sd  2.5%  50% 98% n_eff Rhat
## theta[1] 11.3    0.22 8.0  -1.3 10.3  31  1398    1
## theta[2]  7.9    0.14 6.4  -5.1  7.7  21  2128    1
## theta[3]  6.3    0.19 7.7 -11.2  6.8  21  1639    1
## theta[4]  7.8    0.15 6.8  -6.0  7.7  22  1997    1
## theta[5]  5.3    0.15 6.5  -9.9  5.8  17  1889    1
## theta[6]  6.1    0.16 6.8  -9.0  6.6  19  1795    1
## theta[7] 10.4    0.17 6.8  -1.6  9.8  26  1594    1
## theta[8]  8.3    0.18 7.6  -7.2  8.0  24  1783    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Jun 21 13:08:15 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Research and Methods Resources
me.dewitt.jr@gmail.com

Winston- Salem, NC

Michael DeWitt