[R-sig-ME] MIXED MODEL WITH REPEATED MEASURES

Joerg Luedicke joerg.luedicke at gmail.com
Sat Dec 10 20:56:10 CET 2011


So then let's take this screening-study as an example. Would that
contain cross-sectional information on individuals from all
ages/cohorts, gathered across 20 years (a)? Or would that be
individual trajectories/histories spanning 20 years of time (b)? In
the latter case (b) the dependent variable (i.e. having cancer) would
be time-dependent and could be modeled as such. But I assume you mean
the first set-up (a) where you just look at cross-sections of all
sorts of individuals. I presume that you do not have longitudinal
information for the study subjects in such a set-up. So both possible
designs seem to be very different from yours. However, to stay with
the example, what you propose would be comparable to a design in which
you observe single individuals for, say, 20 years, and some of your
measures vary over time, and some don't, and now you want to predict
what does not change with something that does change. This just does
not make much sense. Consider you have the information whether your
subjects ever had cancer or not, so throughout the entire period of 20
years, they either have a yes or a no. Now you want to predict the
individual's chances of getting cancer or not and one predictor would
be the number of cigarettes a person smokes in a year, measured at
every year across the 20 measurement points. Now consider an
individual that did not smoke in the beginning of that period, smoked
in the middle, and did not smoke at the end of the observation window.
How would you relate these information to somebody having cancer or
not when the individual essentially has cancer all the time, or does
not have cancer all the time, i.e. throughout the entire observation
period? In this case, the longitudinal information about smoking
history just does not contribute anything that would help saying
something about cancer risk. If you want to predict cancer risk in
such a setup you would need to reduce the longitudinal smoking
information to cross-sectional information, for example by building an
indicator whether one ever smoked or not, or something like that. Then
you would be back to set-up (a) and would look at cross-sectional
correlations. This is of course not very desirable as somebody could
get cancer with 30 but only started smoking with 40, but these are the
natural problems with cross-sectional data. In any case, if your
dependent variable is of such cross-sectional nature, there is not
much you can do about it other than stepping back to a more
correlational point of view.

Joerg


On Sat, Dec 10, 2011 at 1:49 PM, Erin Ryan <erin at the-ryans.com> wrote:
> Good insights, Joerg - thanks.
>
> Unfortunately, I wish to predict the value of the dependent variable for
> future subjects well prior to the last measure (what I envision is an answer
> with a conf interval that steadily decreases over time). An apt analogy
> would be a cancer-screening study involving 500 patients over 20 years. In
> such a study, there would be a multitude of indep variables characterizing
> each subject, and the dep variable would simply be a nominal-level measure
> of whether or not a given subject had contracted cancer at some point in the
> 20 years. The purpose of the study would be to identify future subjects who
> are at higher risk of cancer, but the conclusions would be based on
> empirical data in which the dep variable (yes or no for having cancer) would
> be the same across the entire time-series.
>
> So, what is the correct statistical approach for a dataset like this in
> which the data is not iid, but the dep variable is constant for each
> subject?
>
> Erin
>
> -----Original Message-----
> From: Joerg Luedicke [mailto:joerg.luedicke at gmail.com]
> Sent: Saturday, December 10, 2011 10:36 AM
> To: Erin Ryan
> Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES
>
> On Fri, Dec 9, 2011 at 9:33 PM, Erin Ryan <erin at the-ryans.com> wrote:
>> Good suggestions; however, there is inherent value in the temporal
>> progression of the repeated measures, so I need to capture that in some
> way.
>
> If your dependent variable is a constant within units for which you observe
> "temporal progression", then this "progression" does not matter whatsoever.
> Imagine you would fit a conventional regression and your dependent variable
> would be a constant. It would not matter at all how different the subjects
> would be in whatever regard.
>
>> For similar reasons, averaging the values of the independent variables
>> is problematic, as they progress over time to a final, actual value,
>> which presumably should be weighted more heavily. In other words,
>> truth is known on the final repeated measure, but I wish to make
>> accurate predictions much earlier than the final repeated measure.
>
> I don't know what your field of research is, but if you believe that later
> measures are better measures of your object of interest, you could just take
> the last one instead of the average. Or, you could take a weighted average
> of some sort.
>
>
> HTH,
>
> Joerg
>
>>
>> -----Original Message-----
>> From: r-sig-mixed-models-bounces at r-project.org
>> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben
>> Bolker
>> Sent: Thursday, December 08, 2011 5:01 PM
>> To: r-sig-mixed-models at r-project.org
>> Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES
>>
>> Erin Ryan <erin at ...> writes:
>>
>>>
>>> I am trying to specify a mixed model for my research, but I can't
>>> quite get it to work. I've spent several weeks looking thru various
>>> online sources to no avail. I can't find an example of someone trying
>>> to do precisely what I'm trying to do. I'm hoping some smart member
>>> of this mailing list may be able to help.
>>>
>>> First off, full disclosure: (1) I'm an engineer by trade, so the
>>> problem may be related to my ignorance of statistics, and/or (2) I'm
>>> fairly new to R, so the problem may be related to my ignorance of R
>>> syntax. Here is the basic structure of my data (in longitudinal form):
>>
>>  [snip]
>>
>>> The rows below each subject are repeated measures (in years), with
>>> the specific pattern of repeated measurements unique to each subject.
>>> The data contains fixed effects and random effects, and there is
>>> clearly correlation in the random effects within each subject. The
>>> DepVar column represents the dependent variable which is a constant
>>> for each subject. All the data is empirical, but I wish to create a
>>> predictive model. Specifically, I wish to predict the value for
>>> DepVar for new
>> subjects.
>>>
>>> So I understand enough about statistics to know that I must employ a
>>> mixed model. I further understand that I must specify a covariance
>>> matrix structure. Given the relatively high degree of correlation in
>>> consecutive years, an AR(1) structure seems like a good starting
>>> point. I have been trying to build the model in SPSS, but without
>>> success, so I've recently turned to R. My first attempt was as
>>> follows--
>>>
>>> ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2, random =
>>> ~RandomVar1+RandomVar2 | Subject, na.action = na.omit, data =
>>> dataset, corr = corAR1())
>>>
>>> I assume this can't be the right specification since it neglects the
>>> repeated measure aspect of the data, so I instead decided to employ
>>> the
>>> corCAR1 structure, i.e.--
>>>
>>> ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2, random =
>>> ~RandomVar1+RandomVar2 | Subject, na.action = na.omit, data =
>>> dataset, corr = corCAR1(0.5, form = ~ Years | Subject))
>>>
>>> Now perhaps neither correlation structure is the right one (probably
>>> a different discussion for another day), but the problem I'm
>>> experiencing seems to occur regardless of the structure I specify. In
>>> both cases, I get the following error--
>>>
>>> Error in solve.default(estimates[dimE[1] - (p:1), dimE[2] - (p:1),
>>> drop =
>>> FALSE]) :
>>>
>>>   system is computationally singular: reciprocal condition number =
>>> 5.42597e-022
>>>
>>> Anybody know what is going wrong here? This error appears to be
>>> related to the fact that the DepVar is constant for each subject,
>>> because when I select a different dependent variable that is
>>> different for each repeated measure w/in the subject, I do not get this
> error.
>>>
>>
>>  I think you're right that DepVar is fixed per individual.
>> Technical details aside, I'm having trouble seeing how you're going to
>> estimate the effects of predictor variables that vary within subject
>> when you've only got one response per subject.
>> Furthermore, I think what you're terming "RandomVar1" and "RandomVar2"
>> are probably *not* random variables, but rather are variables that
>> vary within subject.   For this response variable, I would suggest
>> averaging the values of RandomVar1 and RandomVar2 per subject and
>> collapsing the data set to a simple linear model on subjects -- and
>> get rid of the correlation model at the same time.  For response
>> variables that do vary within subject, I would suggest
>>
>> ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2+
>>   RandomVar1 + RandomVar2, random = 1 | Subject,
>>  na.action = na.omit, data = dataset, corr = corAR())
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>




More information about the R-sig-mixed-models mailing list