[R-sig-ME] MIXED MODEL WITH REPEATED MEASURES

Sun Dec 11 03:39:36 CET 2011

Thanks for the thoughtful reply. Actually, my dataset is more like option
(b) with longitudinal data for all subjects. To dispense with the
analogy--and be more precise--here is a better explanation of my dataset:

49 defense programs with a known life cycle cost.  I have a large # of indep
variables that are tracked for each of these programs on an approximately
annual basis, each of which might arguably have an impact on the program's
eventual life cycle cost. Not until the program is complete do I truly know
the program's life cycle cost, which is my dependent variable. I wish to
determine if there is a subset of these time-phased indep variables that
will help me determine relatively early in a new program's life what its
eventual life cycle cost will be. This would presumably require a model of
the significant fixed and random effects that characterize these past
programs and based on certain characteristics of the new system (e.g., DoD
service component, type of system, etc), make a prediction of life cycle
cost.

Does that make more sense, or have I completely gone off the reservation?

Erin

-----Original Message-----
From: Joerg Luedicke [mailto:joerg.luedicke at gmail.com] 
Sent: Saturday, December 10, 2011 2:56 PM
To: Erin Ryan; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES

So then let's take this screening-study as an example. Would that contain
cross-sectional information on individuals from all ages/cohorts, gathered
across 20 years (a)? Or would that be individual trajectories/histories
spanning 20 years of time (b)? In the latter case (b) the dependent variable
(i.e. having cancer) would be time-dependent and could be modeled as such.
But I assume you mean the first set-up (a) where you just look at
cross-sections of all sorts of individuals. I presume that you do not have
longitudinal information for the study subjects in such a set-up. So both
possible designs seem to be very different from yours. However, to stay with
the example, what you propose would be comparable to a design in which you
observe single individuals for, say, 20 years, and some of your measures
vary over time, and some don't, and now you want to predict what does not
change with something that does change. This just does not make much sense.
Consider you have the information whether your subjects ever had cancer or
not, so throughout the entire period of 20 years, they either have a yes or
a no. Now you want to predict the individual's chances of getting cancer or
not and one predictor would be the number of cigarettes a person smokes in a
year, measured at every year across the 20 measurement points. Now consider
an individual that did not smoke in the beginning of that period, smoked in
the middle, and did not smoke at the end of the observation window.
How would you relate these information to somebody having cancer or not when
the individual essentially has cancer all the time, or does not have cancer
all the time, i.e. throughout the entire observation period? In this case,
the longitudinal information about smoking history just does not contribute
anything that would help saying something about cancer risk. If you want to
predict cancer risk in such a setup you would need to reduce the
longitudinal smoking information to cross-sectional information, for example
by building an indicator whether one ever smoked or not, or something like
that. Then you would be back to set-up (a) and would look at cross-sectional
correlations. This is of course not very desirable as somebody could get
cancer with 30 but only started smoking with 40, but these are the natural
problems with cross-sectional data. In any case, if your dependent variable
is of such cross-sectional nature, there is not much you can do about it
other than stepping back to a more correlational point of view.

Joerg

On Sat, Dec 10, 2011 at 1:49 PM, Erin Ryan <erin at the-ryans.com> wrote:
> Good insights, Joerg - thanks.
>
> Unfortunately, I wish to predict the value of the dependent variable 
> for future subjects well prior to the last measure (what I envision is 
> an answer with a conf interval that steadily decreases over time). An 
> apt analogy would be a cancer-screening study involving 500 patients 
> over 20 years. In such a study, there would be a multitude of indep 
> variables characterizing each subject, and the dep variable would 
> simply be a nominal-level measure of whether or not a given subject 
> had contracted cancer at some point in the
> 20 years. The purpose of the study would be to identify future 
> subjects who are at higher risk of cancer, but the conclusions would 
> be based on empirical data in which the dep variable (yes or no for 
> having cancer) would be the same across the entire time-series.
>
> So, what is the correct statistical approach for a dataset like this 
> in which the data is not iid, but the dep variable is constant for 
> each subject?
>
> Erin
>
> -----Original Message-----
> From: Joerg Luedicke [mailto:joerg.luedicke at gmail.com]
> Sent: Saturday, December 10, 2011 10:36 AM
> To: Erin Ryan
> Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES
>
> On Fri, Dec 9, 2011 at 9:33 PM, Erin Ryan <erin at the-ryans.com> wrote:
>> Good suggestions; however, there is inherent value in the temporal 
>> progression of the repeated measures, so I need to capture that in 
>> some
> way.
>
> If your dependent variable is a constant within units for which you 
> observe "temporal progression", then this "progression" does not matter
whatsoever.
> Imagine you would fit a conventional regression and your dependent 
> variable would be a constant. It would not matter at all how different 
> the subjects would be in whatever regard.
>
>> For similar reasons, averaging the values of the independent 
>> variables is problematic, as they progress over time to a final, 
>> actual value, which presumably should be weighted more heavily. In 
>> other words, truth is known on the final repeated measure, but I wish 
>> to make accurate predictions much earlier than the final repeated
measure.
>
> I don't know what your field of research is, but if you believe that 
> later measures are better measures of your object of interest, you 
> could just take the last one instead of the average. Or, you could 
> take a weighted average of some sort.
>
>
> HTH,
>
> Joerg
>
>>
>> -----Original Message-----
>> From: r-sig-mixed-models-bounces at r-project.org
>> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben 
>> Bolker
>> Sent: Thursday, December 08, 2011 5:01 PM
>> To: r-sig-mixed-models at r-project.org
>> Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES
>>
>> Erin Ryan <erin at ...> writes:
>>
>>>
>>> I am trying to specify a mixed model for my research, but I can't 
>>> quite get it to work. I've spent several weeks looking thru various 
>>> online sources to no avail. I can't find an example of someone 
>>> trying to do precisely what I'm trying to do. I'm hoping some smart 
>>> member of this mailing list may be able to help.
>>>
>>> First off, full disclosure: (1) I'm an engineer by trade, so the 
>>> problem may be related to my ignorance of statistics, and/or (2) I'm 
>>> fairly new to R, so the problem may be related to my ignorance of R 
>>> syntax. Here is the basic structure of my data (in longitudinal form):
>>
>>  [snip]
>>
>>> The rows below each subject are repeated measures (in years), with 
>>> the specific pattern of repeated measurements unique to each subject.
>>> The data contains fixed effects and random effects, and there is 
>>> clearly correlation in the random effects within each subject. The 
>>> DepVar column represents the dependent variable which is a constant 
>>> for each subject. All the data is empirical, but I wish to create a 
>>> predictive model. Specifically, I wish to predict the value for 
>>> DepVar for new
>> subjects.
>>>
>>> So I understand enough about statistics to know that I must employ a 
>>> mixed model. I further understand that I must specify a covariance 
>>> matrix structure. Given the relatively high degree of correlation in 
>>> consecutive years, an AR(1) structure seems like a good starting 
>>> point. I have been trying to build the model in SPSS, but without 
>>> success, so I've recently turned to R. My first attempt was as
>>> follows--
>>>
>>> ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2, random =
>>> ~RandomVar1+RandomVar2 | Subject, na.action = na.omit, data = 
>>> dataset, corr = corAR1())
>>>
>>> I assume this can't be the right specification since it neglects the 
>>> repeated measure aspect of the data, so I instead decided to employ 
>>> the
>>> corCAR1 structure, i.e.--
>>>
>>> ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2, random =
>>> ~RandomVar1+RandomVar2 | Subject, na.action = na.omit, data = 
>>> dataset, corr = corCAR1(0.5, form = ~ Years | Subject))
>>>
>>> Now perhaps neither correlation structure is the right one (probably 
>>> a different discussion for another day), but the problem I'm 
>>> experiencing seems to occur regardless of the structure I specify. 
>>> In both cases, I get the following error--
>>>
>>> Error in solve.default(estimates[dimE[1] - (p:1), dimE[2] - (p:1), 
>>> drop =
>>> FALSE]) :
>>>
>>>   system is computationally singular: reciprocal condition number =
>>> 5.42597e-022
>>>
>>> Anybody know what is going wrong here? This error appears to be 
>>> related to the fact that the DepVar is constant for each subject, 
>>> because when I select a different dependent variable that is 
>>> different for each repeated measure w/in the subject, I do not get 
>>> this
> error.
>>>
>>
>>  I think you're right that DepVar is fixed per individual.
>> Technical details aside, I'm having trouble seeing how you're going 
>> to estimate the effects of predictor variables that vary within 
>> subject when you've only got one response per subject.
>> Furthermore, I think what you're terming "RandomVar1" and "RandomVar2"
>> are probably *not* random variables, but rather are variables that 
>> vary within subject.   For this response variable, I would suggest 
>> averaging the values of RandomVar1 and RandomVar2 per subject and 
>> collapsing the data set to a simple linear model on subjects -- and 
>> get rid of the correlation model at the same time.  For response 
>> variables that do vary within subject, I would suggest
>>
>> ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2+
>>   RandomVar1 + RandomVar2, random = 1 | Subject,
>>  na.action = na.omit, data = dataset, corr = corAR())
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>