gfam {mgcv} | R Documentation |
Grouped families
Description
Family for use with gam
or bam
allowing a univariate response vector to be made up of variables from several different distributions. The response variable is supplied as a 2 column matrix, where the first column contains the response observations and the second column indexes the distribution (family) from which it comes. gfam
takes a list of families as its single argument.
Useful for modelling data from different sources that are linked by a model sharing some components. Smooth model components that are not shared are usually handled with by
variables (see gam.models
).
Usage
gfam(fl)
Arguments
fl |
A list of families. These can be any families inheriting from |
Details
Each component function of gfam
uses the families supplied in the list fl
to obtain the required quantities for that family's subset of data, and combines the results appropriately. For example it provides the total deviance (twice negative log-likelihood) of the model, along with its derivatives, by computing the family specific deviance and derivatives from each family applied to its subset of data, and summing them. Other quantities are computed in the same way.
Regular exponential families do not compute the same quantities as extended families, so gfam
converts what these families produce to extended.family
form internally.
Scale parameters obviously have to be handled separately for each family, and treated as parameters to be estimated, just like other extended.family
non-location distribution parameters. Again this is handled internally. This requirement is part of the reason that an extended.family
is always produced, even if all elements of fl
are standard exponential families. In consequence smoothing parameter estimation is always by REML or NCV.
Note that the null deviance is currently computed by assuming a single parameter model for each family, rather than just one parameter, which may slightly lower explained deviances. Note also that residual checking should probably be done by disaggregating the residuals by family. For this reason functions are not provided to facilitate residual checking with qq.gam
.
Prediction on the response scale requires that a family index vector is supplied, with the name of the response, as part of the new prediction data. However, families such as ocat
which usually produce matrix predictions for prediction type "response"
, will not be able to do so when part of gfam
.
gfam
relies on the methods in Wood, Pya and Saefken (2016).
Value
An object of class extended.family
.
Author(s)
Simon N. Wood simon.wood@r-project.org
References
Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 doi:10.1080/01621459.2016.1180986
Examples
library(mgcv)
## a mixed family simulator function to play with...
sim.gfam <- function(dist,n=400) {
## dist can be norm, pois, gamma, binom, nbinom, tw, ocat (R assumed 4)
## links used are identitiy, log or logit.
dat <- gamSim(1,n=n,verbose=FALSE)
nf <- length(dist) ## how many families
fin <- c(1:nf,sample(1:nf,n-nf,replace=TRUE)) ## family index
dat[,6:10] <- dat[,6:10]/5 ## a scale that works for all links used
y <- dat$y;
for (i in 1:nf) {
ii <- which(fin==i) ## index of current family
ni <- length(ii);fi <- dat$f[ii]
if (dist[i]=="norm") {
y[ii] <- fi + rnorm(ni)*.5
} else if (dist[i]=="pois") {
y[ii] <- rpois(ni,exp(fi))
} else if (dist[i]=="gamma") {
scale <- .5
y[ii] <- rgamma(ni,shape=1/scale,scale=exp(fi)*scale)
} else if (dist[i]=="binom") {
y[ii] <- rbinom(ni,1,binomial()$linkinv(fi))
} else if (dist[i]=="nbinom") {
y[ii] <- rnbinom(ni,size=3,mu=exp(fi))
} else if (dist[i]=="tw") {
y[ii] <- rTweedie(exp(fi),p=1.5,phi=1.5)
} else if (dist[i]=="ocat") {
alpha <- c(-Inf,1,2,2.5,Inf)
R <- length(alpha)-1
yi <- fi
u <- runif(ni)
u <- yi + log(u/(1-u))
for (j in 1:R) {
yi[u > alpha[j]&u <= alpha[j+1]] <- j
}
y[ii] <- yi
}
}
dat$y <- cbind(y,fin)
dat
} ## sim.gfam
## some examples
dat <- sim.gfam(c("binom","tw","norm"))
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),
family=gfam(list(binomial,tw,gaussian)),data=dat)
predict(b,data.frame(y=1:3,x0=c(.5,.5,.5),x1=c(.3,.2,.3),
x2=c(.2,.5,.8),x3=c(.1,.5,.9)),type="response",se=TRUE)
summary(b)
plot(b,pages=1)
## set up model so that only the binomial observations depend
## on x0...
dat$id1 <- as.numeric(dat$y[,2]==1)
b1 <- gam(y~s(x0,by=id1)+s(x1)+s(x2)+s(x3),
family=gfam(list(binomial,tw,gaussian)),data=dat)
plot(b1,pages=1) ## note the CI width increase