[R-sig-ME] R-sig-mixed-models Digest, Vol 18, Issue 4

Douglas Bates bates at stat.wisc.edu
Mon Jun 9 16:43:30 CEST 2008


It would be much easier to explain how such models could be fit in the
lme4 package if you could provide some sample data or direct us to
some published data. Is that possible?

On Mon, Jun 9, 2008 at 9:35 AM, David Airey <david.airey at vanderbilt.edu> wrote:
> Thank you Dr.s Bates and Gorjanc.
>
> I hope to bother you for additional detail later, once we finish analysis
> with a first pass using SAS Proc Mixed to give a reference point. My purpose
> was to see what was possible in R, that seems a little problematic in Stata
> (at least with my hands).
>
> My collaborator (Lily Wang, Biostatistician, SAS user) has a paper coming
> out in PLoS Genetics, describing the improvements that use of mixed models
> for parametric gene set enrichment analysis has over the usual nonparametric
> GSEA approach by the Broad Institute. Perhaps, these approaches will wind
> their way into SAS JMP Genomics 4.
>
> The kind of structure involved generally includes a fixed effect treatment
> of interest in two or more groups, and random effects for gene set, genes in
> the set, and gene hybridization probes. Gene expression is measured at the
> probe level; gene sets are based on a priori knowledge of function.
>
> In Stata, for example, there would be two ways of doing this model (thanks
> to Yulia at Stata Corp), by brute force or multilevel:
>
> xi: xtmixed depvar i.Group || _all: R.Gene || _all: R.GeXPr || _all: R.GrXGe
> || _all: R.GrXGeXPr, variance
>
> which, if we have 50 genes, 5 probes per gene, and 2 groups, requires 900
> columns for the design matrix.
>
> xi: xtmixed depvar i.Group || Gene: R.Probes, cov(exchangeable) || Group: ||
> Probes:, variance
>
> which requires just 7 columns for the same model.
>
> But, as stated in the earlier thread, Stata just stops when there is a zero
> variance component, because "a ridge is formed by an interval of values near
> zero, which produce the same likelihood and look equally good to the
> optimizer [manual entry]." The default optimizer is Newton-Raphson. There
> may yet be ways of dealing with this problem within Stata; I have yet to
> query their support line.
>
> The reason the above is interesting to me is that Lily works with a related
> approach where the comparison is not by treatment within a given gene set,
> but by enrichment within a gene set compared to the rest of the genes on the
> microarray. This is the better approach to take, but then I think the brute
> force approach could do with some "slimming" offered by the multilevel
> approach.
>
> I know far too little statistics to be asking these questions, but I'd like
> to know about possibilities in R and Stata (and it is interesting!). I use
> Stata until I run into problems, and then I suffer to use R. Dang, that
> sounds damning of both, but it should only sound damning of the user.
>
> Anyway, it does sound like R lme4 will gracefully handle variance components
> that estimate to zero, and therefore allow an uncomplicated loop to be
> performed across gene sets.
>
> Thank you again.
>
> -Dave
>
> On Jun 9, 2008, at 5:00 AM, r-sig-mixed-models-request at r-project.org wrote:
>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sun, 8 Jun 2008 11:07:43 +0000 (UTC)
>> From: Gregor Gorjanc <gregor.gorjanc at bfro.uni-lj.si>
>> Subject: Re: [R-sig-ME] variance components models with zero estimates
>> To: r-sig-mixed-models at r-project.org
>> Message-ID: <loom.20080608T110615-268 at post.gmane.org>
>> Content-Type: text/plain; charset=us-ascii
>>
>> David Airey <david.airey at ...> writes:
>>
>>> When a variance components mixed model is run in Stata, if some of the
>>> variance components are zero, the model may not converge, for rational
>>> reasons according to the manual entry. However, when the same model is
>>> run in SAS, the models with variance components that estimate to zero
>>
>> ...
>>
>> I remember that with lmer() you get a warning/message that says that
>> variance component for a particular effect is effectively zero in such
>> cases and
>> the model converges!
>>
>> gg
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Sun, 8 Jun 2008 11:33:31 -0500
>> From: "Douglas Bates" <bates at stat.wisc.edu>
>> Subject: Re: [R-sig-ME] variance components models with zero estimates
>> To: "David Airey" <david.airey at vanderbilt.edu>
>> Cc: r-sig-mixed-models at r-project.org
>> Message-ID:
>>        <40e66e0b0806080933u2a779a9csa6e90c969441d6b0 at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> On 6/7/08, David Airey <david.airey at vanderbilt.edu> wrote:
>>>
>>> When a variance components mixed model is run in Stata, if some of the
>>> variance components are zero, the model may not converge, for rational
>>> reasons according to the manual entry. However, when the same model is
>>> run
>>> in SAS, the models with variance components that estimate to zero
>>> nonetheless converge. According to some SAS user friends, this is normal
>>> SAS
>>> behavior (I'm new to SAS as of yesterday). If I'm interested in looping
>>> through a set of such models, the SAS behavior is preferred. However, in
>>> Stata such models can be formulated as multilevel models that can
>>> dramatically reduce the dimension of the design matrix. The context where
>>> both behaviors is important is mixed models for gene set enrichment
>>> analysis, where there is a possibility of hundreds of models.
>>
>> That's a very interesting topic and something that I am tempted to
>> answer in far too much detail.
>>
>> Let me provide the brief answer.  Currently there are two versions of
>> the lme4 package available: a released version and a development
>> version.  To install the development version use
>>
>> install.packages("lme4", repos = "http://r-forge.r-project.org")
>>
>> The major difference between these versions is how they handle
>> evaluation of the log-likelihood when variance components are zero.
>> That is, does the log-likelihood get evaluated in a way that behaves
>> smoothly as a variance component goes to zero (and in the more general
>> situation of a variance-covariance matrix approaching singularity)?
>> The released version doesn't behave smoothly at zero and it is
>> necessary to restrict variances to being slightly greater than zero.
>> The development version does go to zero smoothly.
>>
>> Gregor Gorjanc's response relates to the released version.  Neither
>> version throws an error condition when the estimates are zero or
>> near-zero.  However, if you are concerned about estimates that may be
>> zero I recommend using the development version of the lme4 package.
>> It is more robust and reliable at determining the parameter estimates.
>>
>> (The natural question to ask is then why not release the development
>> version?  I have not done so because there are some support routines,
>> notably the Markov chain Monte Carlo sampler, whose versions in the
>> new formulation are incomplete, and some software supporting books
>> depends on those.)
>>
>>> Does R lme4 handle variance components mixed models that have estimates
>>> of
>>> zero for some of the variance components like SAS or Stata?
>>
>> Neither.  The lmer function handles these situations correctly :-)
>>
>>> Is it possible
>>> to loop through variance components models when some of the variance
>>> components are zero?
>>
>> Yes.  Do bear in mind that much of the effort in fitting such models
>> is spent establishing the structures representing the model and data -
>> often more time will be spent formulating the numerical representation
>> of the model than is spent in optimizing the parameter estimates.  If
>> you find that your loops are taking a very long time to run you may
>> want to dig deeper into the code and see if you can by-pass some of
>> the steps and go directly to the optimization phase.  There is now a
>> function called "refit" in lme4 that can be used to go directly to the
>> optimization phase if the model and design are fixed and only response
>> has been changed.
>>
>>> What is a suggested procedure for doing so in R? In
>>> Stata, I would probably use an EM only guess at which variance components
>>> were substantive, and then fit one of several models. What about R lme?
>>
>> i think we would need more detail to be able to answer such a question.
>>
>>
>>
>> ------------------------------
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>




More information about the R-sig-mixed-models mailing list