gfam {mgcv} | R Documentation |

## Grouped families

### Description

Family for use with `gam`

or `bam`

allowing a univariate response vector to be made up of variables from several different distributions. The response variable is supplied as a 2 column matrix, where the first column contains the response observations and the second column indexes the distribution (family) from which it comes. `gfam`

takes a list of families as its single argument.

Useful for modelling data from different sources that are linked by a model sharing some components. Smooth model components that are not shared are usually handled with `by`

variables (see `gam.models`

).

### Usage

```
gfam(fl)
```

### Arguments

`fl` |
A list of families. These can be any families inheriting from |

### Details

Each component function of `gfam`

uses the families supplied in the list `fl`

to obtain the required quantities for that family's subset of data, and combines the results appropriately. For example it provides the total deviance (twice negative log-likelihood) of the model, along with its derivatives, by computing the family specific deviance and derivatives from each family applied to its subset of data, and summing them. Other quantities are computed in the same way.

Regular exponential families do not compute the same quantities as extended families, so `gfam`

converts what these families produce to `extended.family`

form internally.

Scale parameters obviously have to be handled separately for each family, and treated as parameters to be estimated, just like other `extended.family`

non-location distribution parameters. Again this is handled internally. This requirement is part of the reason that an `extended.family`

is always produced, even if all elements of `fl`

are standard exponential families. In consequence smoothing parameter estimation is always by REML or NCV.

Note that the null deviance is currently computed by assuming a single parameter model for each family, rather than just one parameter, which may slightly lower explained deviances. Note also that residual checking should probably be done by disaggregating the residuals by family. For this reason functions are not provided to facilitate residual checking with `qq.gam`

.

Prediction on the response scale requires that a family index vector is supplied, with the name of the response, as part of the new prediction data. However, families such as `ocat`

which usually produce matrix predictions for prediction type `"response"`

, will not be able to do so when part of `gfam`

.

`gfam`

relies on the methods in Wood, Pya and Saefken (2016).

### Value

An object of class `extended.family`

.

### Author(s)

Simon N. Wood simon.wood@r-project.org

### References

Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 doi:10.1080/01621459.2016.1180986

### Examples

```
library(mgcv)
## a mixed family simulator function to play with...
sim.gfam <- function(dist,n=400) {
## dist can be norm, pois, gamma, binom, nbinom, tw, ocat (R assumed 4)
## links used are identitiy, log or logit.
dat <- gamSim(1,n=n,verbose=FALSE)
nf <- length(dist) ## how many families
fin <- c(1:nf,sample(1:nf,n-nf,replace=TRUE)) ## family index
dat[,6:10] <- dat[,6:10]/5 ## a scale that works for all links used
y <- dat$y;
for (i in 1:nf) {
ii <- which(fin==i) ## index of current family
ni <- length(ii);fi <- dat$f[ii]
if (dist[i]=="norm") {
y[ii] <- fi + rnorm(ni)*.5
} else if (dist[i]=="pois") {
y[ii] <- rpois(ni,exp(fi))
} else if (dist[i]=="gamma") {
scale <- .5
y[ii] <- rgamma(ni,shape=1/scale,scale=exp(fi)*scale)
} else if (dist[i]=="binom") {
y[ii] <- rbinom(ni,1,binomial()$linkinv(fi))
} else if (dist[i]=="nbinom") {
y[ii] <- rnbinom(ni,size=3,mu=exp(fi))
} else if (dist[i]=="tw") {
y[ii] <- rTweedie(exp(fi),p=1.5,phi=1.5)
} else if (dist[i]=="ocat") {
alpha <- c(-Inf,1,2,2.5,Inf)
R <- length(alpha)-1
yi <- fi
u <- runif(ni)
u <- yi + log(u/(1-u))
for (j in 1:R) {
yi[u > alpha[j]&u <= alpha[j+1]] <- j
}
y[ii] <- yi
}
}
dat$y <- cbind(y,fin)
dat
} ## sim.gfam
## some examples
dat <- sim.gfam(c("binom","tw","norm"))
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),
family=gfam(list(binomial,tw,gaussian)),data=dat)
predict(b,data.frame(y=1:3,x0=c(.5,.5,.5),x1=c(.3,.2,.3),
x2=c(.2,.5,.8),x3=c(.1,.5,.9)),type="response",se=TRUE)
summary(b)
plot(b,pages=1)
## set up model so that only the binomial observations depend
## on x0...
dat$id1 <- as.numeric(dat$y[,2]==1)
b1 <- gam(y~s(x0,by=id1)+s(x1)+s(x2)+s(x3),
family=gfam(list(binomial,tw,gaussian)),data=dat)
plot(b1,pages=1) ## note the CI width increase
```

*mgcv*version 1.9-0 Index]