[R-sig-ME] MIXED MODEL WITH REPEATED MEASURES

Joerg Luedicke joerg.luedicke at gmail.com
Mon Dec 12 17:17:44 CET 2011


I am sorry, I feel I have to back off at this point. I think this
concerns a whole research plan/ data analysis design which is beyond
the scope of a forum like this for several reasons. All I will say is
that the way you tried to find an answer to your research question -
estimating a mixed model with higher level constants as dependent
variable - will not work, neither technically nor with regard to
substantial aspects. I would recommend consulting a local expert/
statistician to discuss potential strategies for analyzing these data.

That being said, one idea off the top of my head would be to fit a
cross-sectional model to the latest available data and then predict
the final cost with the just estimated parameters using data from
earlier years. If those predictions seem to be reasonable you could
think about predicting the life cycle costs of a program from new data
using the parameter estimates from this prediction model. But this is
only a rough ad-hoc idea and there are perhaps better ways to get what
you want.

Joerg

On Sat, Dec 10, 2011 at 9:39 PM, Erin Ryan <erin at the-ryans.com> wrote:
> Thanks for the thoughtful reply. Actually, my dataset is more like option
> (b) with longitudinal data for all subjects. To dispense with the
> analogy--and be more precise--here is a better explanation of my dataset:
>
> 49 defense programs with a known life cycle cost.  I have a large # of indep
> variables that are tracked for each of these programs on an approximately
> annual basis, each of which might arguably have an impact on the program's
> eventual life cycle cost. Not until the program is complete do I truly know
> the program's life cycle cost, which is my dependent variable. I wish to
> determine if there is a subset of these time-phased indep variables that
> will help me determine relatively early in a new program's life what its
> eventual life cycle cost will be. This would presumably require a model of
> the significant fixed and random effects that characterize these past
> programs and based on certain characteristics of the new system (e.g., DoD
> service component, type of system, etc), make a prediction of life cycle
> cost.
>
> Does that make more sense, or have I completely gone off the reservation?
>
> Erin
>
>
> -----Original Message-----
> From: Joerg Luedicke [mailto:joerg.luedicke at gmail.com]
> Sent: Saturday, December 10, 2011 2:56 PM
> To: Erin Ryan; r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES
>
> So then let's take this screening-study as an example. Would that contain
> cross-sectional information on individuals from all ages/cohorts, gathered
> across 20 years (a)? Or would that be individual trajectories/histories
> spanning 20 years of time (b)? In the latter case (b) the dependent variable
> (i.e. having cancer) would be time-dependent and could be modeled as such.
> But I assume you mean the first set-up (a) where you just look at
> cross-sections of all sorts of individuals. I presume that you do not have
> longitudinal information for the study subjects in such a set-up. So both
> possible designs seem to be very different from yours. However, to stay with
> the example, what you propose would be comparable to a design in which you
> observe single individuals for, say, 20 years, and some of your measures
> vary over time, and some don't, and now you want to predict what does not
> change with something that does change. This just does not make much sense.
> Consider you have the information whether your subjects ever had cancer or
> not, so throughout the entire period of 20 years, they either have a yes or
> a no. Now you want to predict the individual's chances of getting cancer or
> not and one predictor would be the number of cigarettes a person smokes in a
> year, measured at every year across the 20 measurement points. Now consider
> an individual that did not smoke in the beginning of that period, smoked in
> the middle, and did not smoke at the end of the observation window.
> How would you relate these information to somebody having cancer or not when
> the individual essentially has cancer all the time, or does not have cancer
> all the time, i.e. throughout the entire observation period? In this case,
> the longitudinal information about smoking history just does not contribute
> anything that would help saying something about cancer risk. If you want to
> predict cancer risk in such a setup you would need to reduce the
> longitudinal smoking information to cross-sectional information, for example
> by building an indicator whether one ever smoked or not, or something like
> that. Then you would be back to set-up (a) and would look at cross-sectional
> correlations. This is of course not very desirable as somebody could get
> cancer with 30 but only started smoking with 40, but these are the natural
> problems with cross-sectional data. In any case, if your dependent variable
> is of such cross-sectional nature, there is not much you can do about it
> other than stepping back to a more correlational point of view.
>
> Joerg
>
>
> On Sat, Dec 10, 2011 at 1:49 PM, Erin Ryan <erin at the-ryans.com> wrote:
>> Good insights, Joerg - thanks.
>>
>> Unfortunately, I wish to predict the value of the dependent variable
>> for future subjects well prior to the last measure (what I envision is
>> an answer with a conf interval that steadily decreases over time). An
>> apt analogy would be a cancer-screening study involving 500 patients
>> over 20 years. In such a study, there would be a multitude of indep
>> variables characterizing each subject, and the dep variable would
>> simply be a nominal-level measure of whether or not a given subject
>> had contracted cancer at some point in the
>> 20 years. The purpose of the study would be to identify future
>> subjects who are at higher risk of cancer, but the conclusions would
>> be based on empirical data in which the dep variable (yes or no for
>> having cancer) would be the same across the entire time-series.
>>
>> So, what is the correct statistical approach for a dataset like this
>> in which the data is not iid, but the dep variable is constant for
>> each subject?
>>
>> Erin
>>
>> -----Original Message-----
>> From: Joerg Luedicke [mailto:joerg.luedicke at gmail.com]
>> Sent: Saturday, December 10, 2011 10:36 AM
>> To: Erin Ryan
>> Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES
>>
>> On Fri, Dec 9, 2011 at 9:33 PM, Erin Ryan <erin at the-ryans.com> wrote:
>>> Good suggestions; however, there is inherent value in the temporal
>>> progression of the repeated measures, so I need to capture that in
>>> some
>> way.
>>
>> If your dependent variable is a constant within units for which you
>> observe "temporal progression", then this "progression" does not matter
> whatsoever.
>> Imagine you would fit a conventional regression and your dependent
>> variable would be a constant. It would not matter at all how different
>> the subjects would be in whatever regard.
>>
>>> For similar reasons, averaging the values of the independent
>>> variables is problematic, as they progress over time to a final,
>>> actual value, which presumably should be weighted more heavily. In
>>> other words, truth is known on the final repeated measure, but I wish
>>> to make accurate predictions much earlier than the final repeated
> measure.
>>
>> I don't know what your field of research is, but if you believe that
>> later measures are better measures of your object of interest, you
>> could just take the last one instead of the average. Or, you could
>> take a weighted average of some sort.
>>
>>
>> HTH,
>>
>> Joerg
>>
>>>
>>> -----Original Message-----
>>> From: r-sig-mixed-models-bounces at r-project.org
>>> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben
>>> Bolker
>>> Sent: Thursday, December 08, 2011 5:01 PM
>>> To: r-sig-mixed-models at r-project.org
>>> Subject: Re: [R-sig-ME] MIXED MODEL WITH REPEATED MEASURES
>>>
>>> Erin Ryan <erin at ...> writes:
>>>
>>>>
>>>> I am trying to specify a mixed model for my research, but I can't
>>>> quite get it to work. I've spent several weeks looking thru various
>>>> online sources to no avail. I can't find an example of someone
>>>> trying to do precisely what I'm trying to do. I'm hoping some smart
>>>> member of this mailing list may be able to help.
>>>>
>>>> First off, full disclosure: (1) I'm an engineer by trade, so the
>>>> problem may be related to my ignorance of statistics, and/or (2) I'm
>>>> fairly new to R, so the problem may be related to my ignorance of R
>>>> syntax. Here is the basic structure of my data (in longitudinal form):
>>>
>>>  [snip]
>>>
>>>> The rows below each subject are repeated measures (in years), with
>>>> the specific pattern of repeated measurements unique to each subject.
>>>> The data contains fixed effects and random effects, and there is
>>>> clearly correlation in the random effects within each subject. The
>>>> DepVar column represents the dependent variable which is a constant
>>>> for each subject. All the data is empirical, but I wish to create a
>>>> predictive model. Specifically, I wish to predict the value for
>>>> DepVar for new
>>> subjects.
>>>>
>>>> So I understand enough about statistics to know that I must employ a
>>>> mixed model. I further understand that I must specify a covariance
>>>> matrix structure. Given the relatively high degree of correlation in
>>>> consecutive years, an AR(1) structure seems like a good starting
>>>> point. I have been trying to build the model in SPSS, but without
>>>> success, so I've recently turned to R. My first attempt was as
>>>> follows--
>>>>
>>>> ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2, random =
>>>> ~RandomVar1+RandomVar2 | Subject, na.action = na.omit, data =
>>>> dataset, corr = corAR1())
>>>>
>>>> I assume this can't be the right specification since it neglects the
>>>> repeated measure aspect of the data, so I instead decided to employ
>>>> the
>>>> corCAR1 structure, i.e.--
>>>>
>>>> ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2, random =
>>>> ~RandomVar1+RandomVar2 | Subject, na.action = na.omit, data =
>>>> dataset, corr = corCAR1(0.5, form = ~ Years | Subject))
>>>>
>>>> Now perhaps neither correlation structure is the right one (probably
>>>> a different discussion for another day), but the problem I'm
>>>> experiencing seems to occur regardless of the structure I specify.
>>>> In both cases, I get the following error--
>>>>
>>>> Error in solve.default(estimates[dimE[1] - (p:1), dimE[2] - (p:1),
>>>> drop =
>>>> FALSE]) :
>>>>
>>>>   system is computationally singular: reciprocal condition number =
>>>> 5.42597e-022
>>>>
>>>> Anybody know what is going wrong here? This error appears to be
>>>> related to the fact that the DepVar is constant for each subject,
>>>> because when I select a different dependent variable that is
>>>> different for each repeated measure w/in the subject, I do not get
>>>> this
>> error.
>>>>
>>>
>>>  I think you're right that DepVar is fixed per individual.
>>> Technical details aside, I'm having trouble seeing how you're going
>>> to estimate the effects of predictor variables that vary within
>>> subject when you've only got one response per subject.
>>> Furthermore, I think what you're terming "RandomVar1" and "RandomVar2"
>>> are probably *not* random variables, but rather are variables that
>>> vary within subject.   For this response variable, I would suggest
>>> averaging the values of RandomVar1 and RandomVar2 per subject and
>>> collapsing the data set to a simple linear model on subjects -- and
>>> get rid of the correlation model at the same time.  For response
>>> variables that do vary within subject, I would suggest
>>>
>>> ModelFit <- lme(fixed = DepVar ~FixedVar1+FixedVar2+
>>>   RandomVar1 + RandomVar2, random = 1 | Subject,
>>>  na.action = na.omit, data = dataset, corr = corAR())
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>>
>
>
>




More information about the R-sig-mixed-models mailing list