In this example I am going to practice multiple linear regression. Now I will add a second predictor to the model.
I’m going to go ahead and load rstan
for use in this example
library(rstan)
rstan_options(auto_write = TRUE)
library(dplyr)
Again, It is a good practice to generate some fake data to ensure that the model is behaving as expected.
x <- rnorm(40, 10, 5)
z <- rnorm(40, 0, 1)
pred <- data.frame(x = x, y = y)
noise <- rnorm(40,0,1)
y <- x*.5 + z * 0.25 + noise
data <- list( x= as.matrix(pred), y = y, N = length(x), K = 2L)
The below Stan code models the familiar \(y = \beta_1*x + \beta_2*x + \alpha + \epsilon\)
Or more formally:
\[y_n \sim N(\alpha + \beta X_n,\sigma)\]
//multiple linear regression code
data {
int<lower=0> N; // number of data items
int<lower=0> K; // number of predictors
matrix[N, K] x; // predictor matrix
vector[N] y; // outcome vector
}
// this step does some transformations to the data
transformed data {
matrix[N, K] Q_ast;
matrix[K, K] R_ast;
matrix[K, K] R_ast_inverse;
// thin and scale the QR decomposition
Q_ast = qr_Q(x)[, 1:K] * sqrt(N - 1);
R_ast = qr_R(x)[1:K, ] / sqrt(N - 1);
R_ast_inverse = inverse(R_ast);
}
parameters {
real alpha; // intercept
vector[K] theta; // coefficients on Q_ast
real<lower=0> sigma; // error scale
}
model {
y ~ normal(Q_ast * theta + alpha, sigma); // likelihood
}
// converts the quantities back to the original scale
generated quantities {
vector[K] beta;
beta = R_ast_inverse * theta; // coefficients on x
}
So now we need to compile the Stan code. This takes a little while….
mult_linear_regression <- stan_model("stan_mult_linear_regression.stan")
One that code has been compiled then we can actually fit the model. This is a simple model and it converges quickly (which it should).
fit1 <- sampling(mult_linear_regression, data = data, chains = 2, iter = 2000, refresh = 0)
Now we can look at the model outputs with the summary
which prints a lot of information or just look at the parameters one by one. I’m interested in if it could detect the true slope, in this case 0.5:
print(fit1, pars = "beta", probs = c(0.025, 0.5, 0.975))
## Inference for Stan model: stan_mult_linear_regression.
## 2 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 50% 98% n_eff Rhat
## beta[1] 0.46 0 0.03 0.39 0.46 0.53 1057 1
## beta[2] -0.13 0 0.07 -0.28 -0.13 0.00 798 1
##
## Samples were drawn using NUTS(diag_e) at Thu Jun 20 10:49:03 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
As with any good Bayesian analysis it is important to perform some posterior checks to ensure that the model sufficiently converged. One check is the \(\hat{R}\) which is a measure of the mixing on the chaines with a target of one (which is achieved here).
Additionally, we can look at the trace plots to make sure everything converged and thise look good too.
traceplot(fit1)
And finally a pairs plot which shows that sigma, alpha and beta all look reasonable centered with no strange patterns.
pairs(fit1)
For the sake of illustration I am going to turn to our friend, mtcars
to test this model. So first things first is to put our data into the correct format:
data <- list(y = mtcars$mpg, x = as.matrix(dplyr::select(mtcars, wt, disp)),
N = nrow(mtcars), K = 2L)
Now we can fit the model with our already compiled Stan code and let it run.
fit_real <- sampling(mult_linear_regression, data = data, chains = 2, iter = 2000, refresh = 0)
It fit the model without a problem. Now to view the outputs:
print(fit_real, pars = "beta", probs = c(0.025, 0.5, 0.975))
## Inference for Stan model: stan_mult_linear_regression.
## 2 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 50% 98% n_eff Rhat
## beta[1] -3.41 0.05 1.24 -5.73 -3.44 -0.99 597 1
## beta[2] -0.02 0.00 0.01 -0.04 -0.02 0.00 798 1
##
## Samples were drawn using NUTS(diag_e) at Fri Jun 21 13:09:10 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
The \(\hat{R}\) are at one so it doesn’t look too bad. Of course a better practice would be to run this model a little longer and supply strong priors because the effect number of samples is somewhat small. But another model nonetheless.
We can also visualise the outputs using the bayesplot
package
posterior <- as.array(fit_real)
bayesplot::mcmc_intervals(posterior, pars = c("beta[1]","beta[2]", "sigma"))
Research and Methods Resources
me.dewitt.jr@gmail.com
Winston- Salem, NC
Copyright © 2018 Michael DeWitt. All rights reserved.