My favourite modeling method is by far hierarchical modeling which allows for population and group effects. My personal intuition is that these more accurately reflect reality on many levels (outside of the desirable mathematical properties (e.g. skrinkage)).
So in the hierarchical world we are modeling different potential effects at different levels:
\[y_j \sim N(\theta_j, \sigma_j)\] Where:
\[\theta_j = \mu + \tau*\eta_j\]
\[\eta \sim n(0,1)\]
Which accounts for the error at the school level, the true population mean, \(\mu\), population standard deviation \(\tau\) and the school level error \(\eta\) which is given a normal(0,1) prior.
The classic example of a Hierarchical Linear Model is of course the eight school problem. We have eight different school, with estimated treated effects and associated standard deviations for the treatment for that given school.
library(dplyr)
(schools <- tribble(
~"school", ~"estimate", ~"sd",
"A", 28, 15,
"B", 8, 10,
"C", -3, 16,
"D", 7, 11,
"E", -1, 9,
"F", 1, 11,
"G", 18, 10,
"H", 12, 18
))
Now stepping into the stan code we can write
//hierarchical linear model
data{
int<lower=0> J; //number of schools
real y[J]; //treatment effect for each school
real<lower=0>sigma[J]; //s.e. of effects
}
parameters{
real mu; //population mean
real<lower=0> tau; //population sd
vector[J] eta; //school-level error
}
transformed parameters{
vector[J] theta; //school effect
theta = mu + tau*eta;
}
model{
eta ~ normal(0,1);
y~normal(theta, sigma);
}
We can now load our friend rstan
and compile the model:
library(rstan)
hlm_model <- stan_model("stan_hlm.stan")
We prep our data to be fit:
data <- list(J = nrow(schools),
y = schools$estimate,
sigma = schools$sd)
fit_hlm <- sampling(hlm_model, data, chains = 2, iter = 2000, refresh = 0)
## Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
print(fit_hlm, pars = "theta", probs = c(0.025, 0.5, 0.975))
## Inference for Stan model: stan_hlm.
## 2 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 50% 98% n_eff Rhat
## theta[1] 11.3 0.22 8.0 -1.3 10.3 31 1398 1
## theta[2] 7.9 0.14 6.4 -5.1 7.7 21 2128 1
## theta[3] 6.3 0.19 7.7 -11.2 6.8 21 1639 1
## theta[4] 7.8 0.15 6.8 -6.0 7.7 22 1997 1
## theta[5] 5.3 0.15 6.5 -9.9 5.8 17 1889 1
## theta[6] 6.1 0.16 6.8 -9.0 6.6 19 1795 1
## theta[7] 10.4 0.17 6.8 -1.6 9.8 26 1594 1
## theta[8] 8.3 0.18 7.6 -7.2 8.0 24 1783 1
##
## Samples were drawn using NUTS(diag_e) at Fri Jun 21 13:08:15 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Research and Methods Resources
me.dewitt.jr@gmail.com
Winston- Salem, NC
Copyright © 2018 Michael DeWitt. All rights reserved.