---
title: "Mixed-effect Bayesian Network Model"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Mixed-effect Bayesian Network Model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(abn)
library(lme4)
library(Rgraphviz)

# Set seed for reproducibility
set.seed(123)
```

This vignette demonstrates how to fit a mixed-effect Bayesian network model using the `abn` package. 

# Introduction

Multi-level models, also known as hierarchical models are particularly useful when dealing with data that is structured at different levels - for instance, students nested within schools, or repeated measures nested within individuals. 
Multi-level models allow for the estimation of both within-group and between-group effects, and can help to account for the non-independence of observations within groups.

There are various types of multi-level models, including random-intercept models, random-slope models, and models that include both random intercepts and slopes. 
The estimation of multi-level models can be complex, as it involves estimating parameters at multiple levels of organization and accounting for correlations within each level. 
For instance, mixed-effect models with varying intercepts and slopes allow the effects of predictor variables to vary across groups. 
This involves the estimation of numerous parameters, including the variances and covariances of the random slopes and intercepts.

Among the different multi-level models, random-intercept models are often the simplest to understand and implement. 
They allow for variation between groups (e.g., schools or individuals), but assume that the effect of predictor variables is constant across these groups.
This assumption is useful when there is outcome variability attributable to group-level characteristics, but the effects of predictor variables are assumed to be consistent across groups. 
Consequently, random-intercept models are less complex than those that also include random slopes.

Bayesian network models can be formulated based on these multi-level models. 
This approach was formalised by Azzimonti (2021) for discrete data and Scutari (2022) for continuous data. 
These authors demonstrated how to apply these models, including models with random coefficients, in various studies.

This vignette focuses on mixed data, which includes both discrete and continuous variables. 
Unlike in other R packages for mixed-effect Bayesian network modelling, additive Bayesian networks in the R package `abn` do not restrict the possible parent-child combinations. 
However, `abn` is limited to random-intercept models without random coefficients.
The inclusion of random coefficients would render the model estimation process computationally practically unfeasible in this less restricted data distribution setting.

In the following sections, we will demonstrate how to use this package to fit a random-intercept model to mixed data.

# Ground truth data

We will generate first a data set with continuous (Gaussian) and discrete (Binomial) variables with a random-intercept structure.

```{r}
n_groups <- 5

# Number of observations per group
n_obs_per_group <- 1000

# Total number of observations
n_obs <- n_groups * n_obs_per_group

# Simulate group effects
group <- factor(rep(1:n_groups, each = n_obs_per_group))
group_effects <- rnorm(n_groups)

# Simulate variables
G1 <- rnorm(n_obs) + group_effects[group]
B1 <- rbinom(n_obs, 1, plogis(group_effects[group]))
G2 <- 1.5 * B1 + 0.7 * G1 + rnorm(n_obs) + group_effects[group]
B2 <- rbinom(n_obs, 1, plogis(2 * G2 + group_effects[group]))

# Normalize the continuous variables
G1 <- (G1 - mean(G1)) / sd(G1)
G2 <- (G2 - mean(G2)) / sd(G2)

# Create data frame
data <- data.frame(group = group, G1 = G1, G2 = G2, B1 = factor(B1), B2 = factor(B2))

# Look at data
str(data)
summary(data)
```

```{r echo=FALSE}
# Define the nodes and edges of the graph
nodes <- c("G1", "B1", "G2", "B2", "group_effects")
edges <- c("group_effects", "G1",
           "group_effects", "G2",
           "group_effects", "B1",
           "group_effects", "B2",
           "G1", "G2",
           "B1", "G2",
           "G2", "B2")

# Create a graphNEL object with specified nodes and edges
graph <- new("graphNEL", nodes=nodes, edgemode="directed")
for (i in seq(1, length(edges), by=2)) {
  graph <- addEdge(edges[i], edges[i+1], graph)
}

# Layout the graph
g <- layoutGraph(graph)

# Set the node attributes
nodeRenderInfo(g) <- list(
  shape = c(G1="ellipse", B1="box", G2="ellipse", B2="box", group_effects="box"),
  fill = c(G1="lightblue", B1="lightgreen", G2="lightblue", B2="lightgreen", group_effects="lightgrey")
)

# set edge attributes
edgeRenderInfo(g) <- list(
  col = c("G1~G2"="black", "B1~G2"="black", "G2~B2"="black", 
          "group_effects~G1"="lightgrey", "group_effects~G2"="lightgrey", 
          "group_effects~B1"="lightgrey", "group_effects~B2"="lightgrey"))

renderGraph(g)
```


# Additive Bayesian Network Model fitting

We will fit a mixed-effect Bayesian network model to the data using the `abn` package to estimate the relationships between the variables G1, G2, B1, and B2 qualitatively.
The model will include a random intercept for the group variable which is specified using the `group` argument in the `buildScoreCache()` function.

```{r}
# Build the score cache
score_cache <- buildScoreCache(data.df = data,
                               data.dists = list(G1 = "gaussian", 
                                                 G2 = "gaussian", 
                                                 B1 = "binomial", 
                                                 B2 = "binomial"),
                               group.var = "group",
                               max.parents = 2,
                               method = "mle")

# Structure learning
mp_dag <- mostProbable(score.cache = score_cache)

# Plot the DAG
plot(mp_dag)
```

We see that the most probable DAG equals the true DAG.
Note that the `abn` package does not plot the grouping variable in the DAG, but it is included in the model.


```{r}
# Parameter estimation
abn_fit <- fitAbn(object = mp_dag,
                  method = "mle")

# Print the fitted model
print(abn_fit)
```


# Comparison with the results of the `lme4` package

```{r}
# Fit a lmer model for G2
model_g2 <- lmer(G2 ~ G1 + B1 + (1 | group), data = data)

# Print summary
summary(model_g2)
```

```{r}
# Fit a glmer model for B2
model_b2 <- glmer(B2 ~ G1 + G2 + B1 + (1 | group), data = data, family = binomial)

# Print summary
summary(model_b2)
```

The quantitative results of the `abn` package are consistent with the results of the `lme4` package.