Bayesian statistics is on the rise. There is no doubt about that. But the learning curve is quite steep for many researchers. In this post, I give that extra push as you climb that mountain on your route to learning Bayesian statistics. A short demo on how to handle the super package brms to estimate some standard multilevel models can hopefully be inspiring and reassuring. While you’re on the road you may also learn some neat plotting skills!
A great number of disciplines are embracing Bayesian statistics as a tool for the applied researcher. Although some influential scholars like Richard McElreath hold a plea to teach and learn Bayesian statistics by making use of an insider perspective rather than an outsider perspective, it is sometimes hard to imagine what Bayesian statistics is about without a linkage to analyses and models we are familiar with. This post has the objective to introduce Bayesian statistics in the world of educational researchers by applying it to the domain of educational effectiveness studies. It is not my ambition here to give a thorough statistical and mathematical underpinning of Bayesian statistics. The approach for this post is to introduce brms
and the workflow of running Bayesian analyses meanwhile taking you along in the kind of inferences that can be drawn.
I will have succeeded in my aim if, after reading the post:
brms
is a great tool to make a transition towards Bayesian analyses;One of the most widely used programming languages for running Bayesian models is Stan
(Stan Development Team 2020). Given that Stan
is actually a programming language it has a steep learning curve, certainly for people who are less confident in learning new programming languages. To overcome the need to learn the Stan
language some great packages have been developed in R
to allow regular R-users to use the programming language they are most familiar with: R-code. One package stands out in its ‘simplicity’ and ‘versatility’: brms
(Bürkner 2018). Following diagram (from Bürkner 2017) demonstrates the workflow of brms
:
knitr::include_graphics("Workflow_brms.jpeg")
The most convenient part of this workflow is that the analyst familiar with R-formula notation as used in lm()
or lmer()
can use this same notation! But, in brms
this formula is even extended for a broader range of models (e.g. finite mixture models; multivariate models) that we will not cover here. Time to demonstrate the workflow and show how ‘easy’ this is.
For this post we make use of a public available dataset: TIMSS 2019 (see [https://timss2019.org/international-database/] ). More specifically we will use the data for the Dutch-speaking part of Belgium (aka Flanders).
Before we dig into the analyses, let’s load the packages needed and set a default theme for the plots we will make.
library(tidyverse)
library(tidybayes)
library(brms)
library(ggplot2)
library(bayesplot)
theme_set(theme_linedraw() +
theme(text = element_text(family = "Times"),
panel.grid = element_blank()))
Time to import the datasets. We use two datasets from the international release (note that bfl stands for the specific country code):
asgbflm7.sav
: student achievement data and some background data;acgbflm7.sav
: school level data.As these data are stored in SPSS format (.sav) we have to import it. One of the many dedicated packages to do this job is haven
(Wickham and Miller 2020) from which we can use the read.sav()
function.
library(haven)
Student_FL19 <- read_sav("asgbflm7.sav") %>%
select(IDSCHOOL, IDSTUD, ITSEX, ASMMAT01) %>%
mutate(math = ASMMAT01,
Girl = if_else(ITSEX == 1, 1, 0)) %>%
zap_labels()
School_contextFL19 <- read_sav("acgbflm7.sav") %>%
select(IDSCHOOL, ACDGSBC) %>%
mutate(SES_SCHOOL = factor(ACDGSBC)) %>%
zap_labels()
Once we have imported both datasets we make a selection of variables used in this post. From the student achievement dataset we only retain the variables that identify the students and the schools that they belong to (IDSTUD
& IDSCHOOL
), the gender of the student (ITSEX
) and the first plausible value for overall mathematics skill (ASMMAT01
). Making use of the mutate( )
function we rename this last variable to math
and we create a dummy-variable named Girl
that has value 1 if the student is a girl and value 0 if not. Both variables are created for convenience further on in the analyses.
From the school-level dataset we keep the IDSCHOOL
variable as well as the variable ACDGSBC
which is an indicator at school level on School Composition by Socioeconomic Background (with category 1 indicating that the composition is ‘More Affluent’, category 2 indicating that the composition is ‘Neither more Affluent nor more Disadvantaged’ and category 3 indicating that the composition is ‘More Disadvantaged’). To make our coding somewhat easier further on we will rename this variable to SES_SCHOOL
and transform it to a factor.
Although a good Bayesian doesn’t look at the data before building a model and formulating hypotheses, we can still get a quick peek at the data to note whether some irregularities are present. In the following graph we (just to demonstrate the power of ggplot…) present a pairs plot making use of the GGally
package (Schloerke et al. 2021). The data for the plot are aggregated data to the school-level to give us some first very raw idea on what to expect when we will model the impact of School Composition by Socioeconomic Background on mathematics scores.
library(GGally)
Student_FL19 %>%
left_join(School_contextFL19) %>%
group_by(IDSCHOOL) %>%
summarize(mean_math = mean(ASMMAT01, na.rm = T),
ACDGSBC = first(ACDGSBC)) %>%
mutate(SES_SCHOOL = as.factor(ACDGSBC)) %>%
select(mean_math,
SES_SCHOOL) %>%
filter(SES_SCHOOL != 9) %>%
ggpairs(aes(color=SES_SCHOOL),upper = list(continuous = wrap("cor", size = 3))) +
theme_linedraw(base_size = 7)
Before we kick into our Bayesian model estimation, let’s create a dataset that merges both datasets and only keeps data of students for which we have complete data (making use of drop_na()
) so we can do a proper model comparison later on:
Student_FL19 <- Student_FL19 %>%
left_join(School_contextFL19) %>%
drop_na()
We’re ready to model!
In this post we will estimate 3 models:
Let’s start with a very naive model stating that we do not have any information on individual students nor the school or class context to predict mathematics scores. Of course we can assume that there is a multitude of factors that determine individual scores of students (e.g. prior schooling, parental support, peer support, motivation, teacher’s practices,…) so we can see an individual student’s mathematics score as an average of all these factors. Therefore, we can apply the central limit theorem and assume that the math score of an individual student i is coming from a normal distribution (see Lambert 2018). One way to write this formally is as follows:
\[\begin{equation} \begin{aligned} \text{math}_{i} \sim & N(\mu,\sigma_{e_{i}})\\ \end{aligned} \end{equation}\]
with \(\text{math}_{i}\) indicating the score of an individual student \(i\), \(\mu\) indicating an overall average and \(\sigma_{e_{i}}\) indicating the variance between students.
Essentially this boils down to a simple regression model with no explanatory variables and only an intercept. In brms
we can use the following code to estimate this basic model making use of the brm( )
function with math~1
indicating that we only include an intercept in our model (1 is a placeholder in R formulation for incorporating an intercept in a model) and family = "gaussian"
indicating that we assume a normal distribution.
Model_math_naive <- brm(
math ~ 1 , # Typical R formula style
data = Student_FL19,
family = "gaussian")
Now, before running this model it is always advised to think about our priors. brms
will give us some default uninformative priors that you can inspect making use of the get_prior( )
function.
get_prior(
math ~ 1 ,
data = Student_FL19
)
prior class coef group resp dpar nlpar bound
student_t(3, 533.1, 69.7) Intercept
student_t(3, 0, 69.7) sigma
source
default
default
For instance, for the Intercept a Student-t distribution with 3 degrees of freedom, an average of 533.1 and sigma of 68.8 is suggested by brms
. Alternatively we can construct our own prior, based on what we already know on average mathematics scores for Flanders from the previous TIMSS studies: in 2003 the average was 551, in 2011 the average was 549 and in 2015 the average was 546. We could use this prior knowledge to construct a prior by using a normal distribution with an average of 548.9 (being the mean of the 3 previous results for Flanders) and a sigma of 25 expressing our uncertainty on the average for 2019. To illustrate the difference between our custom prior and the default brms
prior we can create a plot on both probability density functions:
library(metRology) # to use dt.scaled()
x <- seq(350,650,by=.1)
custom <- dnorm(x,548.9,25)
brms_default <- dt.scaled(x,3,533.1,68.8)
data_frame(x,custom,brms_default) %>%
pivot_longer(c(custom,brms_default),names_to="Prior") %>%
ggplot(aes(x = x, y = value, lty = Prior)) +
geom_line() +
scale_x_continuous("math",limits = c(350,650)) +
scale_y_continuous("probability density")
Let’s use the custom prior for the Intercept and stick to the default prior for the sigma parameter in our model. In the following code we will set our prior for the parameter of class Intercept
and store it in the object priorMath
. Next we can call the brm()
function with some additional arguments to define our prior (prior = priorMath
), to make use of parallel computing power by using 4 cores for the estimation (cores = 4
) and to make our analysis reproducible by setting a seed (seed = 2021
).
Once our model has run we are ready to inspect our outcomes! Let’s start with the probably most used function in R: summary()
.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: math ~ 1
Data: Student_FL19 (Number of observations: 4257)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 532.13 1.03 530.11 534.17 1.00 3601 2775
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 67.69 0.73 66.22 69.13 1.00 3327 2763
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
From the summary we can learn that our model estimation converged well: \(\widehat R\) < 1.015 and effective sample size greater than 100 times the number of chains (=4) for each parameter (Vehtari et al. 2021).
Interpretation of the output can be done in a similar way as interpreting regular R output. For instance we can see that 532.13 and 67.69 are the point estimates for respectively the intercept and sigma. Nevertheless, as true Bayesians we want to express our uncertainty rather than sticking to a single point estimate. The output generated gives us some first information: 95% credible intervals. Concerning the intercept we learn that we the 95% most probable values for the true population average for mathematics in Flanders lay between 530.11 and 534.17.
Rather than sticking to the output generated with the summary()
command, in a publication we would like to visualise our uncertainty. The package brms
has a great command to extract our posterior samples: posterior_samples()
. This function will form the base for the different plots we will create. Let us first have a quick view on what we get out of this command so our code used further on in this post will make more sense. We wrap this command in a head()
function in order to print only the first 5 entries.
head(
posterior_samples(Model_math_naive), 5
)
b_Intercept sigma lp__
1 529.9010 66.40590 -23991.92
2 531.7248 67.90770 -23988.07
3 532.4621 68.11656 -23988.16
4 531.7378 67.66156 -23988.02
5 530.9714 67.92879 -23988.63
We see that we get a column called b_Intercept
storing possible parameter values for our Intercept parameter in the model and a column called sigma
storing possible parameter values for our sigma parameter in the model. Actually when posterior_samples()
is used we get 4000 intercept and 4000 sigma values:
dim(posterior_samples(Model_math_naive))
[1] 4000 3
These samples are samples of parameter values coming from our posterior distribution and can be used to plot the probability densities. The fact that we have 4000 samples is due to the default settings we used from brms
asking to run the estimation algorithm with 4 chains, each consisting of 2000 iterations of which 1000 iterations are burn-in iterations.
Now that we know that, we can use this as an input to plot our uncertainty making use of a first visualisation with geom_interval()
. We plot a 50%, 89% and 95% credible interval.
posterior_samples(Model_math_naive) %>% # Get draws of the posterior
select(b_Intercept) %>% # Select our Intercept estimate column
median_qi( .width = c(.95, .89, .5)) %>% # Choose the intervals we want to plot
ggplot(aes(x = b_Intercept)) + # Let's plot...
geom_interval(aes(xmin = .lower, # using geom_interval
xmax = .upper)) +
xlab("marginal posterior") + # Set our title for x-axis
scale_y_continuous("b_Intercept", breaks = NULL) +
scale_color_brewer() # Choose a color scale
We can also combine the information on both parameters in a single plot.
posterior_samples(Model_math_naive) %>% # Get draws
select(b_Intercept,sigma) %>% # Select both parameters
pivot_longer(everything()) %>% # Create a dataset of long format
ggplot(aes(x = value, y = 0)) + # Now we plot using ...
stat_interval(.width = c(.50,.89,.95)) + # stat_interval()
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free") +
scale_color_brewer()
To create this kind of graphs it is always wise to start from examples like these and apply them on your own results.
Our previous model was somewhat naive. We know that these students cannot be seen as independently sampled from a population of 10 year olds. The sampling design in TIMSS is two-staged: first a sample of schools is drawn and then a sample of students within the school. Some students share that they come from the same school and therefore they cannot be seen as independent observations. Time to take the school into account and introduce our first genuine multilevel model: a null model. This model only includes an intercept, a variance between schools and a variance between pupils within schools. A typical formal way to write down this model in multilevel publications is as follows:
\[\begin{equation} \text{math}_{ij} = \beta_0 + u_j + e_{ij}\\ \text{with} \\ u_j \sim N(0,\sigma_{u_{j}})\\ \text{and} \\ e_{ij} \sim N(0,\sigma_{e_{ij}}) \end{equation}\]
Note that we added the subscript j now, indicating that we model the mathematics score of a pupil i in school j. And we have two residuals in the model: \(u_j\) referring to a school specific residual and \(e_{ij}\) as the individual student’s deviation from the school average. For both residuals we assume that they are normally distributed with an expected value (or average as you wish) of 0. This same model can also be written in an alternative way, more in line with the more general way of writing models in a Bayesian way:
\[\begin{equation} \begin{aligned} & y_{ij} \sim N(\mu,\sigma_{e_{ij}})\\ & \mu = \beta_0 + u_j \\ & u_j \sim N(0,\sigma_{u_{j}})\\ \end{aligned} \end{equation}\]
The code to analyze this model in brms
, making use of the same priors as in our naive model, looks like this:
Model_math_null <- brm(
math ~ 1 + (1|IDSCHOOL), # typical lme4 style notation
data = Student_FL19,
family = "gaussian",
prior = priorMath,
cores = 4,
seed = 2021
)
Adding (1|IDSCHOOL)
in the formula means that we are expecting the Intercept to vary from school to school. The model output is again easily accessed making use of the summary()
command.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: math ~ 1 + (1 | IDSCHOOL)
Data: Student_FL19 (Number of observations: 4257)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~IDSCHOOL (Number of levels: 133)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 27.80 2.13 23.86 32.21 1.00 1235
Tail_ESS
sd(Intercept) 1905
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 532.22 2.54 527.33 537.32 1.00 1311 1852
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 62.47 0.70 61.12 63.82 1.00 7802 3026
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
From the output we learn that the uncertainty concerning the intercept (the population average mathematics score) is increased as compared to our naive model. Now, the 95% most credible population average scores are between 527 and 537. This loss in precision is exactly what is to expect from this type of model, given the extra variance that we modelled: the variance between schools. For that variance parameter (actually a standard deviation) our model indicates that it somewhere between 24 and 32, with a point estimate of 28. In a frequentist analysis we would get only that point estimate which would be typically used to calculate the Intra-class correlation (ICC). Based on our point estimates of the model we would get an ICC of 0.165 (\(\text{ICC}=\frac{27.80^2}{27.0^2 + 62.47^2}\)) indicating that 16.5% of the variance in mathematics scores can be attributed to differences between schools. The nice thing about a Bayesian approach is that we can use our posterior samples to express our uncertainty about this measure! Here we use our posterior_samples()
function again and with the mutate()
step we calculate an ICC for each of the 4000 samples resulting in 4000 ICC values (only 5 are shown here).
head(
posterior_samples(Model_math_null) %>%
mutate(ICC = (sd_IDSCHOOL__Intercept^2/(sd_IDSCHOOL__Intercept^2 + sigma^2))) %>%
select(b_Intercept, sigma, sd_IDSCHOOL__Intercept, ICC),
5
)
b_Intercept sigma sd_IDSCHOOL__Intercept ICC
1 531.5805 62.78881 28.33029 0.1691461
2 529.5665 62.00816 26.59448 0.1553654
3 532.8093 62.23712 28.62321 0.1745861
4 533.1545 62.40529 28.78412 0.1754256
5 531.5764 62.63484 25.88342 0.1458612
Even a nicer thing is that we can also plot our uncertainty about the ICC!
posterior_samples(Model_math_null) %>% # Get draws of the posterior
mutate(ICC = (sd_IDSCHOOL__Intercept^2/(sd_IDSCHOOL__Intercept^2 + sigma^2))) %>%
select(ICC) %>% # Select ICC
median_qi( .width = c(.95, .89, .5)) %>% # Choose the intervals we want to plot
ggplot(aes(x = ICC)) + # Let's plot...
geom_interval(aes(xmin = .lower, # using geom_interval
xmax = .upper)) +
scale_x_continuous("marginal posterior",
breaks = seq(.13,.21,by =.01)) + # define our x-scale
scale_y_continuous("ICC", breaks = NULL) +
scale_color_brewer() # Choose a color scale
This plot shows that we are 89% confident that the ICC is somewhere between .135 and .205.
We can now bring it all together and create a plot incorporating the marginal posterior distributions of all relevant parameters from the model. The following code does this for us. Note that I used another way of visualizing the uncertainty, making use of a stats_dotsinterval
. Research has shown that this kind of visualisation leads to more correct interpretation than making use of intervals or density curves (see [https://mucollective.northwestern.edu/]).
names <- c("math average","sd pupillevel","sd schoollevel", "ICC")
posterior_samples(Model_math_null) %>%
mutate(ICC = (sd_IDSCHOOL__Intercept^2/(sd_IDSCHOOL__Intercept^2 + sigma^2))) %>%
select(b_Intercept,sigma,sd_IDSCHOOL__Intercept, ICC) %>%
set_names(names) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = c("math average",
"sd schoollevel",
"sd pupillevel",
"ICC"))) %>%
ggplot(aes(x = value, y = 0)) +
stat_dotsinterval(.width = c(.50,.89,.95),
normalize = "panels",
quantiles = 100) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free")
Another plot type you may encounter in multilevel publications is a ‘caterpillar’ plot. Such a plot shows the school-level deviations and a confidence interval for each school’s deviation towards the overall mean. We can make a similar plot easily, again making use of our posterior_samples()
function. Before we do that, it is nice to know that we also have 4000 samples of parameter estimates for each \(u_j\) that can be used to plot. A quick peek at shows what I mean. After pulling out our 4000 samples of parameter estimates out of the object containing our fitted model we select all parameters that are related to IDSCHOOL
making use of the convenient starts_with()
function. Then, for the sake of demonstration, we only keep the specific deviation parameters or the first 3 schools in our dataset:
head(
posterior_samples(Model_math_null) %>%
select(starts_with("r_IDSCHOOL")) %>%
select(1:3), 5
)
r_IDSCHOOL[5002,Intercept] r_IDSCHOOL[5003,Intercept]
1 28.73855 17.71205
2 24.20709 42.17878
3 25.47716 37.23581
4 31.70367 22.24863
5 13.27713 49.58657
r_IDSCHOOL[5004,Intercept]
1 -30.94638
2 -60.69503
3 -28.98984
4 -49.08165
5 -45.03160
So, what we get is for schools with ID codes 5002, 5003 and 5004, 4000 samples of their deviation towards the overall mean. Of course, a plot can make a great summary of this. Let’s plot a caterpillar plot for a random selection of 20 schools in our sample.
set.seed = 2021
posterior_samples(Model_math_null) %>%
select(starts_with("r_IDSCHOOL")) %>%
select(sample(20)) %>%
pivot_longer(starts_with("r_IDSCHOOL")) %>%
mutate(sigma_i = value) %>%
ggplot(aes(x = sigma_i, y = reorder(name, sigma_i))) +
stat_pointinterval(point_interval = mean_qi, .width = .89,
size = 1/6) +
scale_y_discrete(expression(italic(j)), breaks = NULL) +
labs(x = expression(italic(u)[italic(j)])) +
coord_flip()
Most of the applied researchers will introduce some predictor variables. To show how this is done in brms
let us introduce two variables that might be related to differences in mathematics scores: Girl
and SES_SCHOOL
.
Before we build our model it is wise to think about what to we already know concerning the effects of both variables.
For the effect of Girl
we can expect girls to score lower than boys given that for Flanders:
Based on that knowledge we can set a normal distribution with an average of 7 and a standard deviation of 5 as a prior (see figure below to get a visual representation of our prior belief).
For the effect of SES_SCHOOL
we have less information available to define our prior. But, based on our domain knowledge, we can at least expect that
Therefore we will use priors that express this expectation but also express a larger amount of uncertainty than our uncertainty on the effect of Girl
. We will use a normal distribution agian, but this time with an average of -10 and a standard deviation of 10 for the parameter that models the difference between the category ‘More Affluent’ and ‘Neither more Affluent nor more Disadvantaged’ (this parameter will be called SES_SCHOOL2
) and an average of -20 and a standard deviation of 10 for the parameter that will capture the difference between the categories ‘More Affluent’ and ‘More Disadvantaged’ (this parameter will be called SES_SCHOOL3
).
The following plot gives a visualisation of our priors. Notice that the prior for the effect of Girl
is more peaked and narrower than the two other priors, witnessing the higher certainty we have on this gender difference than on the effect of School Composition by Socioeconomic Background.
x <- seq(-40,10,by=.1)
Girl <- dnorm(x,-7,5)
SES_SCHOOL2 <- dnorm(x,-10,10)
SES_SCHOOL3 <- dnorm(x,-20,10)
data_frame(x,Girl,SES_SCHOOL2,SES_SCHOOL3) %>%
pivot_longer(c(Girl,SES_SCHOOL2,SES_SCHOOL3),names_to="Prior") %>%
ggplot(aes(x = x, y = value, lty = Prior)) +
geom_line() +
scale_x_continuous("math",limits = c(-40,10)) +
scale_y_continuous("probability density")
The brms
code looks as follows:
priorMath_predmodel <-
c(set_prior("normal(548.9,25)", class = "Intercept"),
set_prior("normal(-7,5)", class = "b", coef = "Girl"),
set_prior("normal(-10,10)", class = "b", coef = "SES_SCHOOL2"),
set_prior("normal(-20,10)", class = "b", coef = "SES_SCHOOL3")
)
Model_math_pred <- brm(
math ~ 1 + Girl + SES_SCHOOL + (1|IDSCHOOL),
data = Student_FL19,
family = "gaussian",
prior = priorMath_predmodel,
cores = 4,
seed = 2021,
)
Let’s have look at our output.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: math ~ 1 + Girl + SES_SCHOOL + (1 | IDSCHOOL)
Data: Student_FL19 (Number of observations: 4257)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~IDSCHOOL (Number of levels: 133)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 20.69 1.77 17.43 24.33 1.00 1620
Tail_ESS
sd(Intercept) 2500
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 548.89 2.59 543.77 553.81 1.00 2085
Girl -12.74 1.83 -16.31 -9.13 1.00 10673
SES_SCHOOL2 -16.43 4.83 -25.76 -6.99 1.00 2241
SES_SCHOOL3 -40.40 4.89 -49.81 -30.55 1.00 2238
Tail_ESS
Intercept 2319
Girl 2770
SES_SCHOOL2 2734
SES_SCHOOL3 2323
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 62.17 0.70 60.84 63.56 1.00 8487 2809
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
From the output we learn that the previous decline in difference between boys and girls for mathematics in Flanders (see results of TIMSS 2011 and TIMSS 2015) is stopped and the difference between boys and girsl is increased again. While the difference in the previous TIMSS wave was around 6 points, our model shows that we are 95% confident that in 2019 the difference is somewhere between 16 and 9 points. To interpret the output for the effect of SES_SCHOOL
we will plot our marginal posterior distribution of average scores for each of the three categories of schools, making use of the posterior_samples()
function and mutate()
to calculate average scores derived for each of the 4000 sampled sets of parameters.
names <- c("More Affluent",
"Neither more Affluent nor more Disadvantaged",
"More Disadvantaged")
posterior_samples(Model_math_pred) %>%
mutate(Affluent = b_Intercept,
Neutral = (b_Intercept + b_SES_SCHOOL2) ,
Disadvantaged = (b_Intercept + b_SES_SCHOOL3)) %>%
select(Affluent, Neutral, Disadvantaged) %>%
set_names(names) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name,
levels = c("More Disadvantaged",
"Neither more Affluent nor more Disadvantaged",
"More Affluent"))) %>%
ggplot(aes(x = value, y = 0)) +
stat_dotsinterval(.width = .89,
normalize = "panels",
quantiles = 100) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free")
Notice that all three 89% Credible Intervals do not overlap indicating that we may be quiet sure that there are differences in student’s mathematics scores according to the School Composition by Socioeconomic Background of the school they attend.
A final thing that we can do is comparing models. In a frequentist framework researchers will rely on a likelihood ratio test and/or compare information criteria like the AIC and the BIC. For Bayesian analyses we can rely on leave-one-out cross-validation methods. The interested reader in how these indices are calculated and a deeper dive into the philosophy behind these measures is advised to read Lambert (2018).
To do this we can make use of the loo()
and loo_compare()
functions. For each model we apply the loo()
function and store the results in a dedicated object. Then we use the loo_compare()
function and print the result. The code looks like this:
loo_1 <- loo(Model_math_naive)
loo_2 <- loo(Model_math_null)
loo_3 <- loo(Model_math_pred)
loo_comparison <- loo_compare(loo_1,loo_2,loo_3)
To save some time in compiling this blogpost I first saved the results and load it to print it here.
elpd_diff se_diff elpd_loo se_elpd_loo p_loo
Model_math_pred 0.0 0.0 -23673.7 44.5 103.3
Model_math_null -27.4 7.4 -23701.1 44.4 113.4
Model_math_naive -311.0 23.9 -23984.6 44.6 1.9
se_p_loo looic se_looic
Model_math_pred 2.4 47347.3 89.1
Model_math_null 2.6 47402.2 88.8
Model_math_naive 0.1 47969.3 89.2
We can see that the model including the two predictors has the highest expected log pointwise predictive density (\(elpd\) = -23673.7) which corresponds to the lowest looic, indicating this model fits best. The difference with the next best model in \(elpd\) is 27.4. This difference in \(elpd\) is clearly bigger than the accompanying standard error (it is 3.7 times bigger than the standard error) so we can conclude that this model is - I’m sorry guys - significantly better than the null model.
That’s all folks…
I hope that this blogpost gives you some guidance and stimulation to make the transition towards a Bayesian workflow!
Bürkner, Paul-Christian. 2017. “Brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software, Articles 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.
———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.
Lambert, Ben. 2018. A Student’s Guide to Bayesian Statistics. Sage, Thousand Oaks.
Rutkowski, Leslie, Eugenio Gonzalez, Marc Joncas, and Matthias Von Davier. 2010. “International Large-Scale Assessment Data.” Journal Article. Educational Researcher 39 (2): 142–51. https://doi.org/10.3102/0013189x10363170.
Schloerke, Barret, Di Cook, Joseph Larmarange, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg, and Jason Crowley. 2021. GGally: Extension to ’Ggplot2’. https://CRAN.R-project.org/package=GGally.
Stan Development Team. 2020. Stan Modeling Language Users Guide and Reference Manual. https://mc-stan.org.
Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2021. “Rank-Normalization, Folding, and Localization: An Improved \(\widehat{R}\) for Assessing Convergence of Mcmc.” Bayesian Analysis. https://doi.org/10.1214/20-BA1221.
Wickham, Hadley, and Evan Miller. 2020. Haven: Import and Export ’Spss’, ’Stata’ and ’Sas’ Files. https://CRAN.R-project.org/package=haven.
For attribution, please cite this work as
Maeyer (2021, Feb. 22). Reproducible Stats in Education Sciences: Bayesian multilevel models with brms: a demo with TIMSS2019 data. Retrieved from https://svendemaeyer.netlify.app/posts/2021-01-29-modelling-multilevel-models-using-brms/
BibTeX citation
@misc{maeyer2021bayesian, author = {Maeyer, Sven De}, title = {Reproducible Stats in Education Sciences: Bayesian multilevel models with brms: a demo with TIMSS2019 data}, url = {https://svendemaeyer.netlify.app/posts/2021-01-29-modelling-multilevel-models-using-brms/}, year = {2021} }