[R] mgcv: relative risk from GAM with distributed lag
j@de@shod@@ m@iii@g oii googiem@ii@com
j@de@shod@@ m@iii@g oii googiem@ii@com
Tue Jul 26 18:59:14 CEST 2022
Dear all,
I am now looking into bootstrapping to get confidence/ credible
intervals on the relative risk estimates from my GAM, as I really
can't work out how to use the lpmatrix in mgcv to achieve this.
(Unfortunately I have zero experience with bootstrapping - after
having learned my way around GAMs, time series and distributed lag,
and having such promising results, I fear that getting a 'simple'
confidence interval is the thing that's actually going to stop me from
succesful/ on-time completion of this study/ thesis which is due very
soon. Perhaps the person that sent me an off-list response to an
earlier GAM question telling me I was utterly unqualified to undertake
this study, and that help 'from the internet' was unlikely to be
sufficient, is going to be right after all :(
Anyway, there is no alternative but to struggle on - how diffcult can
it be to get a CI? It's just a bit of shading around a curve ;-)
My understanding is that I need to use block bootstrapping, because I
have time series data, and also because the GAM has a distributed lag
model. (Deaths may occur over several days following exposure to high
heat). I've come across tsbootstrap from the tseries package which
seems straighforward to use. As a reminder, my GAM in pseudocode is as
follows (and my query is below the code):
log(deaths) = s(time) + te(temperature, lag) + te(rain, lag)
where lagged variables are created as 7-column matrices as follows:
lagard <- function(x,n.lag=7) {
n <- length(x); X <- matrix(NA,n,n.lag)
for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
X
}
dat <- list(lag=matrix(0:6,nrow(simdat),7,byrow=TRUE),
deaths=simdat$deaths, time = simdat$time)
dat$temp <- lagard(simdat$temp)
dat$rain <- lagard(simdat$rain)
dat$humidity <- lagard(simdat$humidity)
I am planning to bootstrap the time variable and store, say, 500
replicates and use these replicates to subset the original data frame
that contains all weather variables (as well as time). I will then run
the model on each of these replicates, calculate the relative risk for
a range of temperatures relative to a reference temperature, calculate
the mean RR and its CI for each temperature.
There's a potential complication though because unlike in a GLM, in
order to obtain RR from a GAM, you need to make predictions from the
model, typically over the min-max range of the predictor variable of
interest. When I'm running the GAM 500 times, (I think) the
temperature values for which to predict should be the same for each
run of the model. However, the range of temperatures in each replicate
will differ, so I'll need to find that range of the predictor variable
that all replicates have in common. That I can do, but am worried that
the temperature range for which to predict for each run of the model
may get really narrow, due to the possibility that some replicates may
have a much narrower range for temperature than other replicates.
Am I right in thinking that I should use the same temperatures to
predict from for each run of the model, and that this may lead to a
much narrower range of temperatures to predict from than the
temperature range for which I developed the model?
If so, can anyone think of a solution?
(Alternatively, if anyone could provide an example on how to use the
lpmatrix in mgcv to calculate CIs for the relative risk, I'd be
immensely grateful too!!!)
Best wishes,
Jade
On Mon, 25 Jul 2022 at 20:17, jade.shodan using googlemail.com
<jade.shodan using googlemail.com> wrote:
>
> Hi Simon,
>
> Thanks for that example. It's been very useful! It turned out that I
> created the prediction data for lag incorrectly. I've now managed to
> get perspective plots on the relative risk scale.
>
> I've now moved on to trying to get a plot of overall RR (y-axis) for a
> range of temperature values (x-axis), where RR for each temperature
> value is summed over lag periods 0-6 (see bottom plot page 2230 in
> Gasparrini's paper: https://doi.org/10.1002/sim.3940). This now seems
> straighforward with predict.gam() and type = "terms" in the way you
> suggested, but as acknowledged, that doesn't give me credible
> intervals. I've just been told that I MUST present results with
> confidence/ credible intervals (which is fair enough).
>
> For the last two days I've tried making the plot of overall RR with
> CIs, using the lpmatrix, as you suggested (looked it up in your book,
> and also an earlier post of yours with an example:
> https://stat.ethz.ch/pipermail/r-help/2012-May/314387.html), but I
> don't even understand how the book example can be applied to my
> problem, and I can't get the example in the online post to work. I
> don't really understand what an lpmatrix is, (other than that it seems
> to give me a value for each smooth term for each basis function?) or
> what the code in the book example does, so I have no idea what I am
> doing wroing, or what good results are supposed to look like. (I'm
> sorry, my background is in public health rather than statistics).
>
> I am hoping that I can ask for your (or anyone's) help one final
> time. (Once I have the CI's, my modelling will be finished, and I will
> be out of everyone's hair with questions about GAMs... at least for a
> while!).
>
> Here's my code with reproducible data to get a plot of overall RR
> (y-axis) vs temperature (x-axis) where overall RR for each temperature
> value is obtained by summing RR over lag periods 0-6. But I have
> absolutely no idea how to use predict.gam with type = lpmatrix to get
> from this curve to one with CIs.
>
> library(mgcv)
>
> # simulate data
> set.seed(3) # make reproducible example
> simdat <- gamSim(1,400)
> g <- exp(simdat$f/5)
> simdat$y <- rnbinom(g,size=3,mu=g) # negative binomial response var
> simdat$time <- 1:400 # create time series
> names(simdat) <- c("deaths", "temp", "humidity", "rain", "x3", "f",
> "f0", "f1", "f2", "f3", "time")
>
> # lag function based on Simon Wood's book (2017, p.349)
> lagard <- function(x,n.lag=7) {
> n <- length(x); X <- matrix(NA,n,n.lag)
> for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
> X
> }
>
> # create lagged variables as 7-column matrices
> dat <- list(lag=matrix(0:6,nrow(simdat),7,byrow=TRUE),
> deaths=simdat$deaths, time = simdat$time)
> dat$temp <- lagard(simdat$temp)
> dat$rain <- lagard(simdat$rain)
> dat$humidity <- lagard(simdat$humidity)
>
> mod <- gam(deaths~s(time, k=70) + te(temp, lag, k=c(12, 4)) +
> te(humidity, lag, k=c(12, 4)) + te(rain, lag, k=c(12, 4)), data = dat,
> family = nb, method = 'REML', select = TRUE)
>
> # create prediction data
> m1 <- 40 # number of points to predict across the range of each weather variable
> m2 <- 7 # number of lag periods
> n <- m1*m2 # total number of prediction points
>
> pred_temp <- seq(min(dat$temp, na.rm = T), max(dat$temp, na.rm = T),
> length = m1)
> pred_lag <- 0:6
> pred_rain <- seq(min(dat$rain, na.rm = T), max(dat$rain, na.rm = T),
> length = m1)
> pred_humidity <- seq(min(dat$humidity, na.rm = T), max(dat$humidity,
> na.rm = T), length = m1)
> pred_time <- seq(min(dat$time, na.rm = T), max(dat$time, na.rm = T),
> length = m1)
>
> pd <- data.frame(temp=rep(pred_temp,m2),lag=rep(pred_lag,each=m1),rain=rep(pred_rain,m2),humidity=rep(pred_humidity,m2),time=rep(pred_time,m2))
>
> # prediction data with reference temperature set to median
> temperature, all else equal to pd
> pred_temp0 <- median(dat$temp, na.rm=T)
> pd0 <- data.frame(temp=rep(pred_temp0,n),lag=rep(pred_lag,each=m1),rain=rep(pred_rain,m2),humidity=rep(pred_humidity,m2),time=rep(pred_time,m2))
>
> # make predictions
> predictions <- predict.gam(mod, newdata=pd, type = "terms")
> predictions0 <- predict.gam(mod, newdata=pd0, type = "terms")
>
> # calculate RR (relative to reference temperature set above)
> diff <- predictions[,2] - predictions0[,2]
> rr <- as.vector(exp(diff))
>
> # convert rr to matrix required for z argument for persp() plot
> rr_mat <- matrix(rr, m1, m2)
> persp(x=pred_temp,y=pred_lag,z=rr_mat,theta=30,phi=30,ticktype =
> "detailed", col="blue",zlab="RR")
>
> # create plots of overall RR (y-axis, summed over lags 0-6) vs
> temperature (x-axis)
> # convert predictions to matrix to allow summing over lags
> pmat <- matrix(predictions[,2],m1,m2)
> pmat0 <- matrix(predictions0[,2],m1,m2)
>
> # sum predictions over lags 0-6 for each temperature
> psummed <- rowSums(pmat)
> psummed0 <-rowSums(pmat0)
>
> # subtract the predictions for the reference temperature from the
> predictions for each temperature
> pref <- psummed-psummed0
>
> # exponentiating a difference on log-scale gives RR
> rr_overall <- exp(pref)
> plot(pred_temp, rr_overall, type = "l")
>
> # now create CIs for overall RR plot
> Xp <- predict(a1a_high_heat,newdata=pd,type="lpmatrix")
>
> # and how to procede now?????? So close to having results, and yet....
>
>
>
> On Sat, 23 Jul 2022 at 21:26, Simon Wood <simon.wood using bath.edu> wrote:
> >
> > I doubt you want a 7 by 1000 grid for your persp plot. Here's an example
> > of producing a persp plot using predict.gam and a custom grid. Since
> > only the effects of x1 and x2 are being plotted it doesn't matter what
> > x0 and x3 are set to (the model is additive after all). In your case
> > only one smooth term is involved of course.
> >
> > library(mgcv)
> > n <- 200
> > sig <- 2
> > dat <- gamSim(1,n=n,scale=sig)
> >
> > b <- gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)
> >
> > m1 <- 20;m2 <- 30; n <- m1*m2
> > x1 <- seq(.2,.8,length=m1);x2 <- seq(.2,.8,length=m2) ## marginal values
> > for evaluation grid
> > df <- data.frame(x0=rep(.5,n),x1=rep(x1,m2),x2=rep(x2,each=m1),x3=rep(0,n))
> > pf <- predict(b,newdata=df,type="terms")
> >
> > persp(x1,x2,matrix(pf[,2]+pf[,3],m1,m2),theta=-130,col="blue",zlab="")
> >
> > On 23/07/2022 14:54, jade.shodan using googlemail.com wrote:
> > > Hi Simon and all,
> > >
> > > I've corrected some mistakes in setting up the prediction data frame
> > > (sorry, very stressed and sleep deprived due to closing in deadlines),
> > > but am having problems getting the perspective plot using persp().
> > >
> > > I've calculated relative risk (RR) but am having trouble getting the
> > > perspective (or contour) plot. persp() takes x and y vectors in
> > > ascending order, and z as a matrix. I've seen the outer() function
> > > being used to facilitate perspective plots, but every example I've
> > > seen treats z as a function, to be evaluated at combinations for x and
> > > y. In my case, I already have observations for z. Not sure if I need
> > > to use outer(), and if so, how?
> > >
> > > My code for making predictions, getting RR and trying to get the plot
> > > is as follows (full reproducible example including model at the
> > > bottom). Can someone tell me what I'm doing wrong?
> > >
> > > ########### CODE ########################
> > >
> > > # create prediction data
> > > N <- 1000 # number of points for which to predict the smooths
> > > pred_temp <- seq(min(dat$temp, na.rm = T), max(dat$temp, na.rm = T),
> > > length = N) # prediction data for temperature
> > > pred_lag <- rep(c(0, 1,2,3,4,5,6),each=N)
> > > pred_rain <- seq(min(dat$rain, na.rm = T), max(dat$rain, na.rm = T), length = N)
> > > pred_humidity <- seq(min(dat$humidity, na.rm = T), max(dat$humidity,
> > > na.rm = T), length = N)
> > > pred_time <- seq(min(dat$time, na.rm = T), max(dat$time, na.rm = T), length = N)
> > >
> > > pd <- data.frame(temp = pred_temp, lag = pred_lag, humidity =
> > > pred_humidity, rain = pred_rain, time = pred_time)
> > >
> > > # create prediction data with reference temperature set to median temperature
> > > # identical to pd but now with all temperatures set to median value of
> > > temperature
> > > pred_temp0 <- rep(median(dat$temp, na.rm=T), N)
> > > pd0 <- data.frame(temp = pred_temp0, lag = pred_lag, humidity =
> > > pred_humidity, rain = pred_rain, time = pred_time)
> > >
> > > # make predictions
> > > predictions <- predict.gam(mod, pd, type = "terms")
> > > predictions0 <- predict.gam(mod, pd0, type = "terms")
> > >
> > > # calculate RR
> > > diff <- predictions[,2] - predictions0[,2]
> > > rr <- as.vector(exp(diff))
> > >
> > > # convert rr to matrix required for z argument for persp()
> > > rr_mat <- matrix(rr, nrow = N)
> > > persp(pred_temp, pred_lag, rr_mat)
> > >
> > > ######### ERROR MESSAGE ##################################
> > >
> > > The persp call results in the follow error:
> > >
> > > Error in persp.default(pred_temp, pred_lag, rr_mat) :
> > > increasing 'x' and 'y' values expected
> > >
> > > I don't understand this because pred_temp and pred_lag ARE in ascending order.
> > >
> > > ############################################################
> > > FULL REPRODUCIBLE EXAMPLE
> > > ############################################################
> > >
> > > library(mgcv)
> > > set.seed(3) # make reproducible example
> > > simdat <- gamSim(1,400)
> > > g <- exp(simdat$f/5)
> > >
> > > simdat$y <- rnbinom(g,size=3,mu=g) # negative binomial response var
> > > simdat$time <- 1:400 # create time series
> > >
> > > names(simdat) <- c("deaths", "temp", "humidity", "rain", "x3", "f",
> > > "f0", "f1", "f2", "f3", "time")
> > >
> > > # lag function based on Wood (book 2017, p.349 and gamair package
> > > documentation p.54
> > > # https://cran.rstudio.com/web/packages/gamair/gamair.pdf)
> > > lagard <- function(x,n.lag=7) {
> > > n <- length(x); X <- matrix(NA,n,n.lag)
> > > for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
> > > X
> > > }
> > >
> > > # set up lag, temp, rain and humidity as 7-column matrices
> > > # to create lagged variables
> > > dat <- list(lag=matrix(0:6,nrow(simdat),7,byrow=TRUE),
> > > deaths=simdat$deaths, time = simdat$time)
> > > dat$temp <- lagard(simdat$temp)
> > > dat$rain <- lagard(simdat$rain)
> > > dat$humidity <- lagard(simdat$humidity)
> > >
> > > mod <- gam(deaths~s(time, k=70) + te(temp, lag, k=c(12, 4)) +
> > > te(humidity, lag, k=c(12, 4)) + te(rain, lag, k=c(12, 4)), data = dat,
> > > family = nb, method = 'REML', select = TRUE)
> > >
> > > summary(mod)
> > > plot(mod, scheme = 1) # perspective
> > > plot(mod, scheme = 2) # contour
> > >
> > > # create prediction data
> > > N <- 1000 # number of points for which to predict the smooths
> > > pred_temp <- seq(min(dat$temp, na.rm = T), max(dat$temp, na.rm = T), length = N)
> > > pred_lag <- rep(c(0, 1,2,3,4,5,6),each=N)
> > > pred_rain <- seq(min(dat$rain, na.rm = T), max(dat$rain, na.rm = T), length = N)
> > > pred_humidity <- seq(min(dat$humidity, na.rm = T), max(dat$humidity,
> > > na.rm = T), length = N)
> > > pred_time <- seq(min(dat$time, na.rm = T), max(dat$time, na.rm = T), length = N)
> > >
> > > pd <- data.frame(temp = pred_temp, lag = pred_lag, humidity =
> > > pred_humidity, rain = pred_rain, time = pred_time)
> > >
> > > # create prediction data with reference temperature set to median temperature
> > > # identical to pd but now with all temperatures set to median value of
> > > temperature
> > > pred_temp0 <- rep(median(dat$temp, na.rm=T), N)
> > > pd0 <- data.frame(temp = pred_temp0, lag = pred_lag, humidity =
> > > pred_humidity, rain = pred_rain, time = pred_time)
> > >
> > > # make predictions
> > > predictions <- predict.gam(mod, pd, type = "terms")
> > > predictions0 <- predict.gam(mod, pd0, type = "terms")
> > >
> > > # calculate RR
> > > diff <- predictions[,2] - predictions0[,2]
> > > rr <- as.vector(exp(diff))
> > >
> > > # convert rr to matrix required for z argument for persp()
> > > rr_mat <- matrix(rr, nrow = N)
> > > persp(pred_temp, pred_lag, rr_mat)
> > >
> > >
> > >
> > >
> > > On Fri, 22 Jul 2022 at 13:29, jade.shodan using googlemail.com
> > > <jade.shodan using googlemail.com> wrote:
> > >> I made a small error in the code below (not checking for NAs which are
> > >> introduced by the lag function). However, this doesn't solve the issue
> > >> I raised).
> > >>
> > >> So here's the problem (with corrected code) again:
> > >>
> > >> I'm not sure what to do with the 'time' variable. (I don't want to
> > >> make predictions for specific points in time). I coded as follows
> > >> (full reproducible example at bottom of email), but get a warning and
> > >> error:
> > >>
> > >>
> > >> N <- 1000 # number of points for smooth to be predicted
> > >> # new temperatures and lags for prediction
> > >> pred_temp <- seq(min(dat$temp, na.rm = T), max(dat$temp, na.rm = T), length = N)
> > >> pred_lag <- rep(c(0, 1,2,3,4,5,6),each=N) ## IS IT CORRECT TO SET
> > >> UP LAG LIKE THIS?
> > >>
> > >> # not sure if these covariates are required with type = "terms"
> > >> pred_humidity <- rep(median(dat$humidity, na.rm = T), N)
> > >> pred_rain <- rep(median(dat$rain, na.rm = T), N)
> > >> pd <- data.frame(temp = pred_temp, lag = pred_lag, humidity =
> > >> pred_humidity, rain = pred_rain)
> > >>
> > >> predictions <- predict(mod, pd, type = "terms")
> > >>
> > >>
> > >> The predict line creates the following warning and error:
> > >>
> > >> Warning in predict.gam(mod, pd, type = "terms") :
> > >> not all required variables have been supplied in newdata!
> > >>
> > >> Error in model.frame.default(ff, data = newdata, na.action = na.act) :
> > >> object is not a matrix
> > >>
> > >>
> > >> For ease of reference, I've (re)included the full reproducible example:
> > >>
> > >> library(mgcv)
> > >> set.seed(3) # make reproducible example
> > >> simdat <- gamSim(1,400)
> > >> g <- exp(simdat$f/5)
> > >>
> > >> simdat$y <- rnbinom(g,size=3,mu=g) # negative binomial response var
> > >> simdat$time <- 1:400 # create time series
> > >>
> > >> names(simdat) <- c("deaths", "temp", "humidity", "rain", "x3", "f",
> > >> "f0", "f1", "f2", "f3", "time")
> > >>
> > >> # lag function based on Wood (book 2017, p.349 and gamair package
> > >> documentation p.54
> > >> # https://cran.rstudio.com/web/packages/gamair/gamair.pdf)
> > >> lagard <- function(x,n.lag=7) {
> > >> n <- length(x); X <- matrix(NA,n,n.lag)
> > >> for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
> > >> X
> > >> }
> > >>
> > >> # set up lag, temp, rain and humidity as 7-column matrices
> > >> # to create lagged variables
> > >> dat <- list(lag=matrix(0:6,nrow(simda
> > >> t),7,byrow=TRUE),
> > >> deaths=simdat$deaths, time = simdat$time)
> > >> dat$temp <- lagard(simdat$temp)
> > >> dat$rain <- lagard(simdat$rain)
> > >> dat$humidity <- lagard(simdat$humidity)
> > >>
> > >> mod <- gam(deaths~s(time, k=70) + te(temp, lag, k=c(12, 4)) +
> > >> te(humidity, lag, k=c(12, 4)) + te(rain, lag, k=c(12, 4)), data = dat,
> > >> family = nb, method = 'REML', select = TRUE)
> > >>
> > >> # create prediction data
> > >> N <- 1000 # number of points for which to predict the smooths
> > >> pred_temp <- seq(min(dat$temp, na.rm = T), max(dat$temp, na.rm = T), length = N)
> > >> pred_lag <- rep(c(0, 1,2,3,4,5,6),each=N)
> > >> pred_humidity <- rep(median(dat$humidity), N)
> > >> pred_rain <- rep(median(dat$rain), N)
> > >> pd <- data.frame(temp = pred_temp, lag = pred_lag, humidity =
> > >> pred_humidity, rain = pred_rain)
> > >>
> > >> # make predictions
> > >> predictions <- predict(mod, pd, type = "terms")
> > >>
> > >> On Fri, 22 Jul 2022 at 12:47, jade.shodan using googlemail.com
> > >> <jade.shodan using googlemail.com> wrote:
> > >>> Hi Simon,
> > >>>
> > >>> Thanks for the pointers! But I'm not sure what to do with the 'time'
> > >>> variable. (I don't want to make predictions for specific points in
> > >>> time). I coded as follows (full reproducible example at bottom of
> > >>> email), but get a warning and error:
> > >>>
> > >>>
> > >>> N <- 1000 # number of points for smooth to be predicted
> > >>> # new temperatures and lags for prediction
> > >>> pred_temp <- seq(min(dat$temp, na.rm = T), max(dat$temp, na.rm = T), length = N)
> > >>> pred_lag <- rep(c(0, 1,2,3,4,5,6),each=N) ## IS IT CORRECT TO SET
> > >>> UP LAG LIKE THIS?
> > >>>
> > >>> # not sure if these covariates are required with type = "terms"
> > >>> pred_humidity <- rep(median(dat$humidity), N)
> > >>> pred_rain <- rep(median(dat$rain), N)
> > >>> pd <- data.frame(temp = pred_temp, lag = pred_lag, humidity =
> > >>> pred_humidity, rain = pred_rain)
> > >>>
> > >>> predictions <- predict(mod, pd, type = "terms")
> > >>>
> > >>>
> > >>> The predict line creates the following warning and error:
> > >>>
> > >>> Warning in predict.gam(mod, pd, type = "terms") :
> > >>> not all required variables have been supplied in newdata!
> > >>>
> > >>> Error in model.frame.default(ff, data = newdata, na.action = na.act) :
> > >>> object is not a matrix
> > >>>
> > >>>
> > >>> For ease of reference, I've (re)included the full reproducible example:
> > >>>
> > >>> library(mgcv)
> > >>> set.seed(3) # make reproducible example
> > >>> simdat <- gamSim(1,400)
> > >>> g <- exp(simdat$f/5)
> > >>>
> > >>> simdat$y <- rnbinom(g,size=3,mu=g) # negative binomial response var
> > >>> simdat$time <- 1:400 # create time series
> > >>>
> > >>> names(simdat) <- c("deaths", "temp", "humidity", "rain", "x3", "f",
> > >>> "f0", "f1", "f2", "f3", "time")
> > >>>
> > >>> # lag function based on Wood (book 2017, p.349 and gamair package
> > >>> documentation p.54
> > >>> # https://cran.rstudio.com/web/packages/gamair/gamair.pdf)
> > >>> lagard <- function(x,n.lag=7) {
> > >>> n <- length(x); X <- matrix(NA,n,n.lag)
> > >>> for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
> > >>> X
> > >>> }
> > >>>
> > >>> # set up lag, temp, rain and humidity as 7-column matrices
> > >>> # to create lagged variables
> > >>> dat <- list(lag=matrix(0:6,nrow(simdat),7,byrow=TRUE),
> > >>> deaths=simdat$deaths, time = simdat$time)
> > >>> dat$temp <- lagard(simdat$temp)
> > >>> dat$rain <- lagard(simdat$rain)
> > >>> dat$humidity <- lagard(simdat$humidity)
> > >>>
> > >>> mod <- gam(deaths~s(time, k=70) + te(temp, lag, k=c(12, 4)) +
> > >>> te(humidity, lag, k=c(12, 4)) + te(rain, lag, k=c(12, 4)), data = dat,
> > >>> family = nb, method = 'REML', select = TRUE)
> > >>>
> > >>> # create prediction data
> > >>> N <- 1000 # number of points for which to predict the smooths
> > >>> pred_temp <- seq(min(dat$temp, na.rm = T), max(dat$temp, na.rm = T), length = N)
> > >>> pred_lag <- rep(c(0, 1,2,3,4,5,6),each=N)
> > >>> pred_humidity <- rep(median(dat$humidity), N)
> > >>> pred_rain <- rep(median(dat$rain), N)
> > >>> pd <- data.frame(temp = pred_temp, lag = pred_lag, humidity =
> > >>> pred_humidity, rain = pred_rain)
> > >>>
> > >>> # make predictions
> > >>> predictions <- predict(mod, pd, type = "terms")
> > >>>
> > >>>
> > >>> On Fri, 22 Jul 2022 at 09:54, Simon Wood <simon.wood using bath.edu> wrote:
> > >>>>
> > >>>> On 21/07/2022 15:19, jade.shodan--- via R-help wrote:
> > >>>>> Hello everyone (incl. Simon Wood?),
> > >>>>>
> > >>>>> I'm not sure that my original question (see below, including
> > >>>>> reproducible example) was as clear as it could have been. To clarify,
> > >>>>> what I would to like to get is:
> > >>>>>
> > >>>>> 1) a perspective plot of temperature x lag x relative risk. I know
> > >>>>> how to use plot.gam and vis.gam but don't know how to get plots on the
> > >>>>> relative risk scale as opposed to "response" or "link".
> > >>>> - You are on the log scale so I think that all you need to do is to use
> > >>>> 'predict.gam', with 'type = "terms"' to get the predictions for the
> > >>>> te(temp, lag) term over the required grid of lags and temperatures.
> > >>>> Suppose the dataframe of prediction data is 'pd'. Now produce pd0, which
> > >>>> is identical to pd, except that the temperatures are all set to the
> > >>>> reference temperature. Use predict.gam to predict te(temp,lag) from pd0.
> > >>>> Now the exponential of the difference between the first and second
> > >>>> predictions is the required RR, which you can plot using 'persp',
> > >>>> 'contour', 'image' or whatever. If you need credible intervals see pages
> > >>>> 341-343 of my 'GAMs: An intro with R' book (2nd ed).
> > >>>>
> > >>>>> 2) a plot of relative risk (accumulated across all lags) vs
> > >>>>> temperature, given a reference temperature. An example of such a plot
> > >>>>> can be found in figure 2 (bottom) of this paper by Gasparrini et al:
> > >>>>> https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3940
> > >>>> - I guess this only makes sense if you have the same temperature at all
> > >>>> lags. So this time produce a data.frame with each desired temperature
> > >>>> repeated for each lag: 'pd1'. Again use predict.gam(...,type="terms").
> > >>>> Then sum the predictions over lags for each temperature, subtract the
> > >>>> minimum, and take the exponential. Same as above for CIs.
> > >>>>
> > >>>> best,
> > >>>>
> > >>>> Simon
> > >>>>
> > >>>>> I've seen Simon Wood's response to a related issue here:
> > >>>>> https://stat.ethz.ch/pipermail/r-help/2012-May/314387.html
> > >>>>> However, I'm not sure how to apply this to time series data with
> > >>>>> distributed lag, to get the above mentioned figures.
> > >>>>>
> > >>>>> Would be really grateful for suggestions!
> > >>>>>
> > >>>>> Jade
> > >>>>>
> > >>>>> On Tue, 19 Jul 2022 at 16:07, jade.shodan using googlemail.com
> > >>>>> <jade.shodan using googlemail.com> wrote:
> > >>>>>> Dear list members,
> > >>>>>>
> > >>>>>> Does anyone know how to obtain a relative risk/ risk ratio from a GAM
> > >>>>>> with a distributed lag model implemented in mgcv? I have a GAM
> > >>>>>> predicting daily deaths from time series data consisting of daily
> > >>>>>> temperature, humidity and rainfall. The GAM includes a distributed lag
> > >>>>>> model because deaths may occur over several days following a high heat
> > >>>>>> day.
> > >>>>>>
> > >>>>>> What I'd like to do is compute (and plot) the relative risk
> > >>>>>> (accumulated across all lags) for a given temperature vs the
> > >>>>>> temperature at which the risk is lowest, with corresponding confidence
> > >>>>>> intervals. I am aware of the predict.gam function but am not sure if
> > >>>>>> and how it should be used in this case. (Additionally, I'd also like
> > >>>>>> to plot the relative risk for different lags separately).
> > >>>>>>
> > >>>>>> I apologise if this seems trivial to some. (Actually, I hope it is,
> > >>>>>> because that might mean I get a solution!) I've been looking for
> > >>>>>> examples on how to do this, but found nothing so far. Suggestions
> > >>>>>> would be very much appreciated!
> > >>>>>>
> > >>>>>> Below is a reproducible example with the GAM:
> > >>>>>>
> > >>>>>> library(mgcv)
> > >>>>>> set.seed(3) # make reproducible example
> > >>>>>> simdat <- gamSim(1,400) # simulate data
> > >>>>>> g <- exp(simdat$f/5)
> > >>>>>> simdat$y <- rnbinom(g,size=3,mu=g) # negative binomial response var
> > >>>>>> simdat$time <- 1:400 # create time series
> > >>>>>> names(simdat) <- c("deaths", "temp", "humidity", "rain", "x3", "f",
> > >>>>>> "f0", "f1", "f2", "f3", "time")
> > >>>>>>
> > >>>>>> # lag function based on Simon Wood (book 2017, p.349 and gamair
> > >>>>>> package documentation p.54
> > >>>>>> # https://cran.rstudio.com/web/packages/gamair/gamair.pdf)
> > >>>>>> lagard <- function(x,n.lag=7) {
> > >>>>>> n <- length(x); X <- matrix(NA,n,n.lag)
> > >>>>>> for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
> > >>>>>> X
> > >>>>>> }
> > >>>>>>
> > >>>>>> # set up lag, temp, rain and humidity as 7-column matrices
> > >>>>>> # to create lagged variables - based on Simon Wood's example
> > >>>>>> dat <- list(lag=matrix(0:6,nrow(simdat),7,byrow=TRUE),
> > >>>>>> deaths=simdat$deaths, time = simdat$time)
> > >>>>>> dat$temp <- lagard(simdat$temp)
> > >>>>>> dat$rain <- lagard(simdat$rain)
> > >>>>>> dat$humidity <- lagard(simdat$humidity)
> > >>>>>>
> > >>>>>> mod <- gam(deaths~s(time, k=70) + te(temp, lag, k=c(12, 4)) +
> > >>>>>> te(humidity, lag, k=c(12, 4)) + te(rain, lag, k=c(12, 4)), data = dat,
> > >>>>>> family = nb, method = 'REML', select = TRUE)
> > >>>>>>
> > >>>>>> summary(mod)
> > >>>>>> plot(mod, scheme = 1)
> > >>>>>> plot(mod, scheme = 2)
> > >>>>>>
> > >>>>>> Thanks for any suggestions you may have,
> > >>>>>>
> > >>>>>> Jade
> > >>>>> ______________________________________________
> > >>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > >>>>> and provide commented, minimal, self-contained, reproducible code.
> > >>>> --
> > >>>> Simon Wood, School of Mathematics, University of Edinburgh,
> > >>>> https://www.maths.ed.ac.uk/~swood34/
> > >>>>
> > --
> > Simon Wood, School of Mathematics, University of Edinburgh,
> > https://www.maths.ed.ac.uk/~swood34/
> >
More information about the R-help
mailing list