---
title: "Get started"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Get started}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 7
)
```
```{r setup}
library(tidygam)
library(mgcv)
library(dplyr)
library(ggplot2)
theme_set(theme_light())
```
# Overview
The tidymv package offers two main user-oriented functions:
- `predict_gam()`: returns predictions of the outcome variable based on the predictors in the GAM model. The user can specify specific values for any predictor, and exclude model terms.
- `get_difference()`: returns the difference between two smooths and those intervals along the smooth that do not include 0.
The output of these function can then be plotted with `plot()`, through the methods `plot.tidygam()` and `plot.tidygam.diff()`.
# Basic model prediction
Let's start with a simple model and get model-based predictions.
We will use the `gest` data table, available in tidygam.
The table consists of counts of gestures performed by infants of three cultural backgrounds who participating in a longitudinal study (see `?gest` for details and references).
```{r gest}
data("gest")
gest
```
The following GAM models the overall trend in number of gestures from 10 to 12 months of age.
```{r gs}
gs <- gam(
count ~ s(months, k = 3),
data = gest,
family = poisson
)
summary(gs)
```
Now we can obtain the predicted counts with `predict_gam()`.
```{r gs-pred}
gs_pred <- predict_gam(gs)
gs_pred
```
# Plot predicted values
`predict_gam()` returns an object of class `tidygam`, which can be plotted with `plot()`.
The user has to specify the "series" used as the *x*-axis.
The outcome variable is automatically selected for the *y*-axis.
```{r gs-pred-plot}
gs_pred %>%
plot(series = "months")
```
Since the `gs` model used a log-link function, the output of `predict_gam()` is in log-odds, rather than in counts.
We can convert the log-odds to counts by exponentiating them.
The `tran_fun` argument allows the user to specify a function to transform the predicted outcome values with.
```{r gs-pred-exp}
predict_gam(gs, tran_fun = exp) %>%
plot(series = "months")
```
# Models with `by`-variables
Smooths can be fitted to different levels of a factor using so-called `by`-variables, specified within the smooth function `s()` with the `by` argument. Note that smooths are automatically centred so you need to include the `by`-variable as a parametric term too.
In this model, we fit a smooth along `months` for each background in the data.
```{r gs-by}
gs_by <- gam(
count ~ background + s(months, by = background, k = 3),
data = gest,
family = poisson
)
summary(gs_by)
```
The predictor for comparison is selected with the `comparison` argument in `plot()`.
```{r gs-by-comp}
gs_by %>%
predict_gam(length_out = 20, series = "months", tran_fun = exp) %>%
plot(comparison = "background")
```
Note that the output of `plot()` is a ggplot2 object, which can be modified using ggplot2 functions.
```{r gs-by-comp-2}
gs_by %>%
predict_gam(length_out = 20, series = "months", tran_fun = exp) %>%
plot(comparison = "background") +
scale_color_brewer(type = "qual") + scale_fill_brewer(type = "qual")
```
Let's try now a model with both `gesture` and `background` as `by`-variables.
```{r gs-by-2}
gs_by_2 <- gam(
count ~ gesture + background +
s(months, by = background, k = 3) +
s(months, by = gesture, k = 3),
data = gest,
family = poisson
)
summary(gs_by_2)
```
Note that models like this one are conceptually equivalent to linear models without interactions between the `by`-variables.
This is clear when plotting the predictions: notice how the shapes of the smooths are very similar within each background, and they only differ in slope (this is the effect of including separate `by`-variables).
```{r gs-by-2-plot}
gs_by_2 %>%
predict_gam(length_out = 20, series = "months", tran_fun = exp) %>%
plot(comparison = "gesture") +
scale_color_brewer(type = "qual") + scale_fill_brewer(type = "qual") +
facet_grid(~ background)
```
If you wish to plot the effect of specific `by`-variables, you can exclude terms like in the following code chunk. Note that the name of terms has to match precisely the name in the model summary and that you should exclude both parametric and smooth terms with the same `by`-variable. You also need to pick any level of the excluded variable (otherwise the predictions will be repeated for each level in the excluded variable, but since the variable is excluded, the predictions will be the same).
```{r gs-by-2-plot-2}
to_exclude <- c("s(months):gestureho_gv", "s(months):gesturepoint", "s(months):gesturereach",
"gesturepoint", "gesturereach")
gs_by_2 %>%
predict_gam(length_out = 20, series = "months", tran_fun = exp,
exclude_terms = to_exclude,
# pick any value of the excluded variables.
values = list(gesture = "point")) %>%
plot(comparison = "background") +
scale_color_brewer(type = "qual") + scale_fill_brewer(type = "qual")
```
The following section illustrates how to specify and plot models with the GAM equivalent of classical interactions (e.g. `background * gesture`).
# Models with factor interactions
Classical interactions between factors as usually obtained in linear models with the `:` syntax (e.g. `background:gesture`) are not possible in GAMs.
An alternative way to specify what are called interactions in generalised linear models is by creating a new factor which is the interaction of the two or more factors using the `interaction()` function, and include this "factor interaction" predictors as a `by`-variable.
```{r gs-i}
gest <- gest %>%
mutate(back_gest = interaction(background, gesture))
gs_i <- gam(
count ~ back_gest + s(months, by = back_gest, k = 3),
data = gest,
family = poisson
)
summary(gs_i)
```
When predicting values, the user can use the `separate` argument to specify factor-interaction variables in the model that can be split back into their individual components.
This gives greater flexibility when plotting.
```{r gs-i-plot}
predict_gam(
gs_i, tran_fun = exp,
separate = list(back_gest = c("background", "gesture"))
) %>%
plot(series = "months", comparison = "gesture") +
facet_grid(~ background)
```
# Models with factor smooth interactions (`bs = "fs"`)
Factor smooth interactions are the GAM equivalent of random/group-level effects (intercepts and slopes).
Let's work with the `struct` data, which contains event-related potentials measures of subjects listening to music and speech.
For each type (music vs language), the stimuli were either "grammatical" or "ungrammatical" (i.e. the stimuli either respected structural rules or they did not).
This is a subset of the original data, including voltage values only for electrode 62.
```{r struct}
data("struct")
struct
```
Let's fit the model with factor smooth interactions (`bs = "fs"`).
```{r st}
struct <- struct %>%
mutate(stim_gram = interaction(stimulus.condition, grammar.condition))
st <- bam(
voltage ~ stim_gram +
s(t, by = stim_gram, k = 5) +
s(t, subject, bs = "fs", m = 1),
data = struct
)
summary(st)
```
When predicting values we want to exclude the factor smooth interaction, as we would with random/group-level effects in linear models.
Note that GAM terms to be excluded must be specified as they are named in the output of `summary()`.
```{r st-plot}
predict_gam(
st,
length_out = 50,
series = "t",
exclude_terms = "s(t,subject)",
# Pick any subject: since we are removing the random effect, it does not
# matter which one you pick, the predictions will be the same
values = c(subject = "03"),
separate = list(stim_gram = c("stimulus", "grammar"))
) %>%
plot(comparison = "grammar") +
geom_hline(yintercept = 0) +
facet_grid(~ stimulus)
```
If the fs interaction is not removed, the predicted smooth for each individual level in the fs interaction is returned.
```{r st-plot-2}
predict_gam(
st,
length_out = 50,
series = "t",
separate = list(stim_gram = c("stimulus", "grammar"))
) %>%
plot(comparison = "grammar") +
geom_hline(yintercept = 0) +
facet_grid(~ stimulus)
```