[R] A question on modeling brain growth using GAM

Sat Apr 8 02:41:58 CEST 2017

Simon

I wonder whether I can take advantage of this thread and ask you another
related question. Now, I want to get the 95%CI of the fit and their
derivatives as well. For the original fitted curves, It is straightforward
as the option "type=terms" can be used to get the CI for the fixed effect.
Now, I want to get the CI for the fixed effect in the first derivative as
well. I tried to follow your example in the predict.gam() and got stuck
here:

%------------- predict.gam example-------------------------
    dat <- gamSim(1,n=300,scale=sig)
    b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
     plot(b,pages=1)

     ## now evaluate derivatives of smooths with associated standard
     ## errors, by finite differencing...
     x.mesh <- seq(0,1,length=200) ## where to evaluate derivatives
     newd <- data.frame(x0 = x.mesh,x1 = x.mesh, x2=x.mesh,x3=x.mesh)
     X0 <- predict(b,newd,type="lpmatrix")

     eps <- 1e-7 ## finite difference interval
     x.mesh <- x.mesh + eps ## shift the evaluation mesh
     newd <- data.frame(x0 = x.mesh,x1 = x.mesh, x2=x.mesh,x3=x.mesh)
     X1 <- predict(b,newd,type="lpmatrix")

     Xp <- (X1-X0)/eps ## maps coefficients to (fd approx.) derivatives
     colnames(Xp)      ## can check which cols relate to which smooth

     par(mfrow=c(2,2))
     Xi <- Xp*0
     Xi[,1:9+1] <- Xp[,1:9+1] ## Xi%*%coef(b) = smooth deriv i
     df <- Xi%*%coef(b)              ## ith smooth derivative
     df.sd <- rowSums(Xi%*%b$Vp*Xi)^.5 ## cheap diag(Xi%*%b$Vp%*%t(Xi))^.5
%-----------------predict.gam example-----------------------

Am I right that df.sd is the standard error for the derivatives and I can
get the 95% CI by 1.96*df.sd? If so, is this the CI for the fixed effect or
fixed + random effects for the predictions that have random effects
modeled? as I mentioned earlier, my model includes both fixed and random
effects:
gamm1=gamm(BrainVolume~ s(correctedAge) + s(correctedAge, subjIndexF,
bs="fs", k=5), data=mydata)

For the newd in the predict(), I constructed a data frame with all the time
points for all subjects and fed it into the predict.gam() function. If I
only want to get the CI for the fixed effect only for the derivatives, what
should I change here?

Many thanks in advance!

L

On Thu, Apr 6, 2017 at 2:16 PM, Leon Lee <bhamlion78 at gmail.com> wrote:

> Hi, Simon
>
> Thank you for your explanation! I followed the instructions and
> successfully get the predicted values with both fixed and random effects
> incorporated: pred.new=predict.gam(gamm1$gam,newdata,type="response").
>
> Also, what I meant to say was "plot(gamm1$gam, pages=1)" for left and
> right figures. I didn't attach any figures.
>
> Thank you very much for the help!
> L
>
> On Thu, Apr 6, 2017 at 10:44 AM, Simon Wood <simon.wood at bath.edu> wrote:
>
>>
>> gamObj=gam(brainVolume~ s(correctedAge) +  s(subjIndexF, bs="re") +
>> s(subjIndexF, correctedAge, bs="re"), method="REML", data=mydata), where
>> subjIndexF is a factor for each subject. I was thrown an error saying "more
>> coefficients than data".
>>
>> --- I'm not sure exactly  how many scans and subjects you have. The above
>> model will have 10 + 2*(number of subjects ) coeffs. If that is more than
>> the number of scans then gam will not handle it. Depending on the numbers
>> involved you could reduce the k parameter to s(correctedAge), to fix the
>> problem. (e.g. with 31 subjects and 70 scans s(correctedAge=8) should
>> work).
>>
>> However, when I tried to model similar (please correct me if they are not
>> similar) things using GAMM based on description in
>> ?factor.smooth.interaction:
>>  gamm1=gamm(BrainVolume~ s(correctedAge) + s(correctedAge, subjIndexF,
>> bs="fs", k=5), data=mydata)
>>
>> --- It's not the same model. You now have a random smooth curve per
>> subject. You can add random effects in gamm using the list form of the
>> syntax for specifying random effects in lme. see ?gamm. Random intercepts
>> and slopes can be added that way.
>>
>> The model ran.  When I plotted the data using plot(gamm1), I got two
>> figures: the left one is the group mean and 95%CI, which I assume is the
>> results by gamm1$gam model. The right one shows 30 lines (the number of
>> subjects in my data) fluctuating around 0, which I assume is the random
>> effects (gamm1$lme) modeled within each subject that can be added onto the
>> group mean for individual curves. Is my understanding correct? If so, how
>> can I extract these curves from gamm1$lme?
>>
>> --- I would extract the fitted curves using predict(gamm1$gam,...,
>> type="terms") supplying the factor levels and correctedAges at which you
>> want to evaluate the curves.
>>
>> Many thanks!
>> L
>>
>>
>>
>> On Thu, Apr 6, 2017 at 8:22 AM, Simon Wood <simon.wood at bath.edu> wrote:
>>
>>> If 'subjIndexF' is a factor for subject, then s(subjIndexF, bs="re")
>>> will produce a random effect for subject. i.e. each subject will be given
>>> its own random intercept term, which is a way that repeated measures data
>>> like this are often handled.
>>>
>>> The reason for the s(subjIndexF, bs="re") syntax is that smooths can be
>>> viewed as Gaussian random effects, so simple Gaussian random effects can
>>> also be viewed as (0-dimensional) smooths. In general s(x,z,w,bs="re") just
>>> appends the columns of model.matrix(~x:z:w-1) to the gam model matrix, and
>>> treats the associated coefficients as i.i.d. Gaussian random effects with a
>>> common variance (to be estimated). In principle this works with any number
>>> of arguments to s(...,bs="re").
>>>
>>> See ?random.effects (and its linked help files) in mgcv for more.
>>>
>>> There are mechanisms for allowing random smooth curves for each subject,
>>> (e.g. ?factor.smooth.interaction), but I would only use these if simpler
>>> approaches really aren't adequate here.
>>>
>>> best,
>>> Simon
>>>
>>>
>>>
>>>
>>> On 30/03/17 17:06, David Winsemius wrote:
>>>
>>>> On Mar 30, 2017, at 6:56 AM, Leon Lee <bhamlion78 at gmail.com> wrote:
>>>>>
>>>>> David
>>>>>
>>>>> Thank you for your reply. I apologize if I posted in the wrong forum,
>>>>> as I really couldn't decide which forum is the best place for my question
>>>>> and I saw similar questions asked before in this forum.
>>>>>
>>>>> I agree that a sample of ~30 subjects (70 scans in total), the model
>>>>> can be too complicated. Based on that, I did the following:
>>>>> (1) ignored the gender effect, as we have less females than males.
>>>>> (2) corrected chronological age based on their gestational age, that
>>>>> is, we subtracted an infant's chronological age by 2 weeks, if the infant's
>>>>> gestational age is 38 weeks instead of 40weeks.
>>>>>
>>>>> When I ran the model with corrected age, gestational age and their
>>>>> interactions modeled, I found the main effect of gestational age and the
>>>>> interaction between the two are gone.
>>>>>
>>>>> So, my final model will look something like this:
>>>>> gamObj=gam(brainVolume~ s(correctedAge) +  s(subjIndexF, bs="re"),
>>>>> method="REML", data=mydata)
>>>>>
>>>>> Does this look more reasonable?
>>>>>
>>>> I'm still having difficulty understand how a "smoothing" function would
>>>> be used to handle repeated measures without some sort of "group-within"
>>>> indicator.
>>>>
>>>> I would have imagined (and this is because I have no experience with
>>>> using this package for repeated measures) something along the lines of:
>>>>
>>>>   ...+s(correctedAg|subjIndexF)
>>>>
>>>> I see this statement in the docs:
>>>>
>>>>
>>>> smooth.construct.re.smooth.spec {mgcv}
>>>>
>>>> "gam can deal with simple independent random effects, by exploiting the
>>>> link between smooths and random effects to treat random effects as smooths.
>>>> s(x,bs="re") implements this."
>>>>
>>>> But I don't see that as applying to the dependency between individuals
>>>> measured repeatedly. I find no examples of repeated measures problems being
>>>> solve by gam(). There is also a note on the same page:
>>>>
>>>> "Note that smooth ids are not supported for random effect terms. Unlike
>>>> most smooth terms, side conditions are never applied to random effect terms
>>>> in the event of nesting (since they are identifiable without side
>>>> conditions)."
>>>>
>>>> When I do a search on "using gam mgcv formula mixed effects" I am
>>>> referred to packages 'gamm' and 'gamm4' produced by the same author (Simon
>>>> Wood) as pkg 'mgcv', or to package `nlme`.
>>>>
>>>>
>>>> Yes, I am relatively new to the mixed model. We originally applied
>>>>> functional data analysis (PACE) on the data, but want to see the results
>>>>> using a different approach. Also, I couldn't find the Mixed Models list, do
>>>>> you mind sending me a link?
>>>>>
>>>> This is a link to the main mailing lists page:
>>>>
>>>> https://www.r-project.org/mail.html
>>>>
>>>> Found with a search on Google with "R mailing lists"
>>>>
>>>>
>>>> Thank you!
>>>>> Longchuan
>>>>>
>>>>>
>>>>> On Tue, Mar 28, 2017 at 4:28 PM, David Winsemius <
>>>>> dwinsemius at comcast.net> wrote:
>>>>>
>>>>> On Mar 28, 2017, at 9:32 AM, Leon Lee <bhamlion78 at gmail.com> wrote:
>>>>>>
>>>>>> Hi, R experts
>>>>>>
>>>>>> I am new to R & GAM toolbox and would like to get inputs from you all
>>>>>> on my
>>>>>> models. The question I have is as follows:
>>>>>> I have 30 subjects with each subject being scanned from one to three
>>>>>> times
>>>>>> in the first year of life. The brain volume from each scan was
>>>>>> measured.
>>>>>> The scan time was randomly distributed from birth to 1 year.
>>>>>> Each subject has different gestational age ranging from 38 to 41 weeks
>>>>>> Each subject has chronological age from birth to 1 year old
>>>>>> Each subject has gender category.
>>>>>> Now, I want to look at how predictors, such as subject's
>>>>>> chronological age,
>>>>>> gestational age and gender will explain the changes in brain volume.
>>>>>> I also
>>>>>> want to include interactions between gender and age, gestational and
>>>>>> chronological age. Random effects are also included in the model to
>>>>>> account
>>>>>> for subject variability. My model looks like the follows:
>>>>>>
>>>>>> gam=gam(brainVolume~ s(age) + ti(age, gestationalAge) +
>>>>>> gestationalAge +
>>>>>> sex + s(age, by=sex) +  s(subjIndexF, bs="re"), method="REML",
>>>>>> data=mydata)
>>>>>>
>>>>>> Are there any obvious mistakes in the model? Any suggestions will be
>>>>>> greatly appreciated!
>>>>>>
>>>>> I'm not seeing mistakes in the syntax but I would question whether 30
>>>>> subjects is sufficient to adequately support estimates in a a model of this
>>>>> complexity. I would also think that the 's(age)' and 'sex' terms would get
>>>>> aliased out in a model with "+ s(age, by=sex)". Most R regression functions
>>>>> handle removal of over-parametrization automatically.
>>>>>
>>>>> You also have a variable number of measurements per subject. I am
>>>>> unable to comment on the effort to account for the implicit and variably
>>>>> measured correlation and auto-correlation of values within subjects using a
>>>>> "smooth" on subjIndexF, since that is not an approach I was familiar with.
>>>>> But I am getting concerned whether you are also new to statistical modeling
>>>>> in addition to your use of R and GAM being "new to you"?
>>>>>
>>>>> (Perhaps Simon or one of the mixed-effects experts can correct the
>>>>> gaps in my understanding of how to model repeated measures in the context
>>>>> of small numbers of subjects and irregular emasurements.)
>>>>>
>>>>> Please read the Posting Guide and the pages of candidate mailing
>>>>> lists. Rhelp is not really the place to go when you need statistical
>>>>> advice. I'm not sure if this is really in the center of concerns that get
>>>>> discussed on the Mixed Models list, but to my eyes it would be a better fit
>>>>> there.
>>>>>
>>>>> --
>>>>> David.
>>>>>
>>>>>> L
>>>>>>
>>>>>>        [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>>>> ng-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>> David Winsemius
>>>>> Alameda, CA, USA
>>>>>
>>>>>
>>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> --
>>> Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
>>> +44 (0)117 33 18273 <%2B44%20%280%29117%2033%2018273>
>>> http://www.maths.bris.ac.uk/~sw15190
>>>
>>>
>>
>>
>> --
>> Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
>> +44 (0)117 33 18273     http://www.maths.bris.ac.uk/~sw15190
>>
>>
>

	[[alternative HTML version deleted]]