[R-sig-ME] R-sig-mixed-models Digest, Vol 18, Issue 4

Mon Jun 9 16:46:34 CEST 2008

That may be possible. I will contact you off list.

-Dave

On Jun 9, 2008, at 9:43 AM, Douglas Bates wrote:

> It would be much easier to explain how such models could be fit in the
> lme4 package if you could provide some sample data or direct us to
> some published data. Is that possible?
>
> On Mon, Jun 9, 2008 at 9:35 AM, David Airey <david.airey at vanderbilt.edu 
> > wrote:
>> Thank you Dr.s Bates and Gorjanc.
>>
>> I hope to bother you for additional detail later, once we finish  
>> analysis
>> with a first pass using SAS Proc Mixed to give a reference point.  
>> My purpose
>> was to see what was possible in R, that seems a little problematic  
>> in Stata
>> (at least with my hands).
>>
>> My collaborator (Lily Wang, Biostatistician, SAS user) has a paper  
>> coming
>> out in PLoS Genetics, describing the improvements that use of mixed  
>> models
>> for parametric gene set enrichment analysis has over the usual  
>> nonparametric
>> GSEA approach by the Broad Institute. Perhaps, these approaches  
>> will wind
>> their way into SAS JMP Genomics 4.
>>
>> The kind of structure involved generally includes a fixed effect  
>> treatment
>> of interest in two or more groups, and random effects for gene set,  
>> genes in
>> the set, and gene hybridization probes. Gene expression is measured  
>> at the
>> probe level; gene sets are based on a priori knowledge of function.
>>
>> In Stata, for example, there would be two ways of doing this model  
>> (thanks
>> to Yulia at Stata Corp), by brute force or multilevel:
>>
>> xi: xtmixed depvar i.Group || _all: R.Gene || _all: R.GeXPr ||  
>> _all: R.GrXGe
>> || _all: R.GrXGeXPr, variance
>>
>> which, if we have 50 genes, 5 probes per gene, and 2 groups,  
>> requires 900
>> columns for the design matrix.
>>
>> xi: xtmixed depvar i.Group || Gene: R.Probes, cov(exchangeable) ||  
>> Group: ||
>> Probes:, variance
>>
>> which requires just 7 columns for the same model.
>>
>> But, as stated in the earlier thread, Stata just stops when there  
>> is a zero
>> variance component, because "a ridge is formed by an interval of  
>> values near
>> zero, which produce the same likelihood and look equally good to the
>> optimizer [manual entry]." The default optimizer is Newton-Raphson.  
>> There
>> may yet be ways of dealing with this problem within Stata; I have  
>> yet to
>> query their support line.
>>
>> The reason the above is interesting to me is that Lily works with a  
>> related
>> approach where the comparison is not by treatment within a given  
>> gene set,
>> but by enrichment within a gene set compared to the rest of the  
>> genes on the
>> microarray. This is the better approach to take, but then I think  
>> the brute
>> force approach could do with some "slimming" offered by the  
>> multilevel
>> approach.
>>
>> I know far too little statistics to be asking these questions, but  
>> I'd like
>> to know about possibilities in R and Stata (and it is  
>> interesting!). I use
>> Stata until I run into problems, and then I suffer to use R. Dang,  
>> that
>> sounds damning of both, but it should only sound damning of the user.
>>
>> Anyway, it does sound like R lme4 will gracefully handle variance  
>> components
>> that estimate to zero, and therefore allow an uncomplicated loop to  
>> be
>> performed across gene sets.
>>
>> Thank you again.
>>
>> -Dave
>>
>> On Jun 9, 2008, at 5:00 AM, r-sig-mixed-models-request at r- 
>> project.org wrote:
>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Sun, 8 Jun 2008 11:07:43 +0000 (UTC)
>>> From: Gregor Gorjanc <gregor.gorjanc at bfro.uni-lj.si>
>>> Subject: Re: [R-sig-ME] variance components models with zero  
>>> estimates
>>> To: r-sig-mixed-models at r-project.org
>>> Message-ID: <loom.20080608T110615-268 at post.gmane.org>
>>> Content-Type: text/plain; charset=us-ascii
>>>
>>> David Airey <david.airey at ...> writes:
>>>
>>>> When a variance components mixed model is run in Stata, if some  
>>>> of the
>>>> variance components are zero, the model may not converge, for  
>>>> rational
>>>> reasons according to the manual entry. However, when the same  
>>>> model is
>>>> run in SAS, the models with variance components that estimate to  
>>>> zero
>>>
>>> ...
>>>
>>> I remember that with lmer() you get a warning/message that says that
>>> variance component for a particular effect is effectively zero in  
>>> such
>>> cases and
>>> the model converges!
>>>
>>> gg
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Sun, 8 Jun 2008 11:33:31 -0500
>>> From: "Douglas Bates" <bates at stat.wisc.edu>
>>> Subject: Re: [R-sig-ME] variance components models with zero  
>>> estimates
>>> To: "David Airey" <david.airey at vanderbilt.edu>
>>> Cc: r-sig-mixed-models at r-project.org
>>> Message-ID:
>>>       <40e66e0b0806080933u2a779a9csa6e90c969441d6b0 at mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> On 6/7/08, David Airey <david.airey at vanderbilt.edu> wrote:
>>>>
>>>> When a variance components mixed model is run in Stata, if some  
>>>> of the
>>>> variance components are zero, the model may not converge, for  
>>>> rational
>>>> reasons according to the manual entry. However, when the same  
>>>> model is
>>>> run
>>>> in SAS, the models with variance components that estimate to zero
>>>> nonetheless converge. According to some SAS user friends, this is  
>>>> normal
>>>> SAS
>>>> behavior (I'm new to SAS as of yesterday). If I'm interested in  
>>>> looping
>>>> through a set of such models, the SAS behavior is preferred.  
>>>> However, in
>>>> Stata such models can be formulated as multilevel models that can
>>>> dramatically reduce the dimension of the design matrix. The  
>>>> context where
>>>> both behaviors is important is mixed models for gene set enrichment
>>>> analysis, where there is a possibility of hundreds of models.
>>>
>>> That's a very interesting topic and something that I am tempted to
>>> answer in far too much detail.
>>>
>>> Let me provide the brief answer.  Currently there are two versions  
>>> of
>>> the lme4 package available: a released version and a development
>>> version.  To install the development version use
>>>
>>> install.packages("lme4", repos = "http://r-forge.r-project.org")
>>>
>>> The major difference between these versions is how they handle
>>> evaluation of the log-likelihood when variance components are zero.
>>> That is, does the log-likelihood get evaluated in a way that behaves
>>> smoothly as a variance component goes to zero (and in the more  
>>> general
>>> situation of a variance-covariance matrix approaching singularity)?
>>> The released version doesn't behave smoothly at zero and it is
>>> necessary to restrict variances to being slightly greater than zero.
>>> The development version does go to zero smoothly.
>>>
>>> Gregor Gorjanc's response relates to the released version.  Neither
>>> version throws an error condition when the estimates are zero or
>>> near-zero.  However, if you are concerned about estimates that may  
>>> be
>>> zero I recommend using the development version of the lme4 package.
>>> It is more robust and reliable at determining the parameter  
>>> estimates.
>>>
>>> (The natural question to ask is then why not release the development
>>> version?  I have not done so because there are some support  
>>> routines,
>>> notably the Markov chain Monte Carlo sampler, whose versions in the
>>> new formulation are incomplete, and some software supporting books
>>> depends on those.)
>>>
>>>> Does R lme4 handle variance components mixed models that have  
>>>> estimates
>>>> of
>>>> zero for some of the variance components like SAS or Stata?
>>>
>>> Neither.  The lmer function handles these situations correctly :-)
>>>
>>>> Is it possible
>>>> to loop through variance components models when some of the  
>>>> variance
>>>> components are zero?
>>>
>>> Yes.  Do bear in mind that much of the effort in fitting such models
>>> is spent establishing the structures representing the model and  
>>> data -
>>> often more time will be spent formulating the numerical  
>>> representation
>>> of the model than is spent in optimizing the parameter estimates.   
>>> If
>>> you find that your loops are taking a very long time to run you may
>>> want to dig deeper into the code and see if you can by-pass some of
>>> the steps and go directly to the optimization phase.  There is now a
>>> function called "refit" in lme4 that can be used to go directly to  
>>> the
>>> optimization phase if the model and design are fixed and only  
>>> response
>>> has been changed.
>>>
>>>> What is a suggested procedure for doing so in R? In
>>>> Stata, I would probably use an EM only guess at which variance  
>>>> components
>>>> were substantive, and then fit one of several models. What about  
>>>> R lme?
>>>
>>> i think we would need more detail to be able to answer such a  
>>> question.
>>>
>>>
>>>
>>> ------------------------------
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>