[R-sig-ME] Introductory texts

Tue Apr 8 07:46:46 CEST 2008

Dear all,
there is a very large silent majority which struggles to follow your sophisticated thoughts and discussions. We need a definite answer to the old question: those of us that are very comfortable with the multiple  linear regression but know very few things about mixed-effects models, need introductory texts with EXAMPLES but not many maths to help us RUN mixed effects models,  INTERPRET their results and TEST their fit. Is it possible that any one of you out there who plans to write (or is currently writing) a book on running mixed models in R may uses some of us as pilot-testers of their text, so that we gain some early info until we manage to buy it? In any case, we could even help anyone with such aspirations by providing our own examples, datasets etc from several different disciplines, so that the book will attract people from various disciplines and has an increased audience. 

Just a few ideas

Dr. Iasonas Lamprianou
Department of Education
The University of Manchester
Oxford Road, Manchester M13 9PL, UK
Tel. 0044 161 275 3485
iasonas.lamprianou at manchester.ac.uk

----- Original Message ----
From: "r-sig-mixed-models-request at r-project.org" <r-sig-mixed-models-request at r-project.org>
To: r-sig-mixed-models at r-project.org
Sent: Tuesday, 8 April, 2008 3:25:37 AM
Subject: R-sig-mixed-models Digest, Vol 16, Issue 21

Send R-sig-mixed-models mailing list submissions to
    r-sig-mixed-models at r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
or, via email, send a message with subject or body 'help' to
    r-sig-mixed-models-request at r-project.org

You can reach the person managing the list at
    r-sig-mixed-models-owner at r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-sig-mixed-models digest..."

Today's Topics:

  1. Simulating linear mixed models - the Venables approach
      (Douglas Bates)
  2. lmer syntax (Michael Kubovy)
  3. Re: lmer syntax (Douglas Bates)
  4. Re: Fwd: same old question - lme4 and p-values (Simon Blomberg)
  5. Re: Fwd: same old question - lme4 and p-values (Simon Blomberg)
  6. Re: Fwd: same old question - lme4 and p-values (John Maindonald)

----------------------------------------------------------------------

Message: 1
Date: Mon, 7 Apr 2008 11:29:50 -0500
From: "Douglas Bates" <bates at stat.wisc.edu>
Subject: [R-sig-ME] Simulating linear mixed models - the Venables
    approach
To: "R Mixed Models" <r-sig-mixed-models at r-project.org>
Message-ID:
    <40e66e0b0804070929q250f3805qaf128754c6b5f89d at mail.gmail.com>
Content-Type: text/plain; charset=WINDOWS-1252

In case you missed it on the R-help list, I urge readers of this list
to consider the understated elegance of the code Bill Venables posted
for simulating data from a simple random effects model.

> set.seed(7658943)
>
> fph <- 0.4
> Sigh <- sqrt(0.0002)
> Sigi <- sqrt(0.04)
>
> reH <- rnorm(90, fph, Sigh)  ## hospid effects
> dta <- within(expand.grid(hospid = 1:90, empid = 1:80),
          fpi1 <- reH[hospid] + rnorm(7200, fph, Sigi))

One is reminded of John Keats

'Beauty is truth, truth beauty,?that is all    
Ye know on earth, and all ye need to know.'

------------------------------

Message: 2
Date: Mon, 7 Apr 2008 13:34:05 -0400
From: Michael Kubovy <kubovy at virginia.edu>
Subject: [R-sig-ME] lmer syntax
To: R Mixed Models <r-sig-mixed-models at r-project.org>
Message-ID: <7A6681AD-8293-40FC-BCAB-FA8CFC9B6D50 at virginia.edu>
Content-Type: text/plain

Dear lme4 folk,

The lmer help page gives two examples:
(fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
(fm2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject),  
sleepstudy))
How is the following different, in principle, from the above? Is it  
that the above treats (Intercept) and Days as orthogonal, whereas the  
latter checks to see if they are? What would be appropriate if the  
correlation between Days and Intercept (here 0.067, apparently) were  
large?
(fm3 <- lmer(Reaction ~ Days + (1 + Days | Subject), sleepstudy))
……………
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 610.8 24.72
    Days 35.1 5.92 0.067
Residual 655.1 25.59
_____________________________
Professor Michael Kubovy
University of Virginia
Department of Psychology
USPS:    P.O.Box 400400    Charlottesville, VA 22904-4400
Parcels:    Room 102        Gilmer Hall
        McCormick Road    Charlottesville, VA 22903
Office:    B011    +1-434-982-4729
Lab:        B019    +1-434-982-4751
Fax:        +1-434-982-4766
WWW:    http://www.people.virginia.edu/~mk9y/

    [[alternative HTML version deleted]]

------------------------------

Message: 3
Date: Mon, 7 Apr 2008 12:59:51 -0500
From: "Douglas Bates" <bates at stat.wisc.edu>
Subject: Re: [R-sig-ME] lmer syntax
To: "Michael Kubovy" <kubovy at virginia.edu>
Cc: R Mixed Models <r-sig-mixed-models at r-project.org>
Message-ID:
    <40e66e0b0804071059p10cc9380x18c76dc44c005215 at mail.gmail.com>
Content-Type: text/plain; charset=WINDOWS-1252

On Mon, Apr 7, 2008 at 12:34 PM, Michael Kubovy <kubovy at virginia.edu> wrote:
> Dear lme4 folk,

> The lmer help page gives two examples:
>  (fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
> (fm2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), sleepstudy))
> How is the following different, in principle, from the above? Is it that the
> above treats (Intercept) and Days as orthogonal, whereas the latter checks
> to see if they are? What would be appropriate if the correlation between
> Days and Intercept (here 0.067, apparently) were large?
> (fm3 <- lmer(Reaction ~ Days + (1 + Days | Subject), sleepstudy))

Model fm3 is equivalent to model fm1.  In the linear model formula
language used in the S language, the intercept term is implicit so the
random-effects term (Days|Subject) is equivalent to (1+Days|Subject).
Some authors, notably Gelman and Hill in their 2007 book, prefer to
use the second form so that the presence of the intercept is explicit.
I can see the point of that.

Every random effect is associated with one and only one random-effects
term in the model formula and with one and only one level of the
grouping factor for that random-effects term.  The general rules for
determining the variance-covariance of the random effects (as fit in
lmer) are:

- random effects associated with different terms are independent
- random effects associated with the same term but with different
levels of the grouping factor are independent
- within a term the random effects may be partitioned according to
the levels of the grouping factor.  The variance-covariance matrix of
the vector of random effects associated with each of these levels of
the grouping factor is a constant, symmetric, positive semidefinite
matrix.  It has no additional constraints other than being symmetric
and positive semidefinite.  (In SAS-speak this is called an
"unstructured" variance-covariance matrix but the mathematician in me
refuses to accept the concept of an unstructured, symmetic, positive
semidefinite matrix.)

(Note that when I refer to "levels" in the above description I am
referring to the S-language concept of the levels of a factor, not
levels of random effects in the sense of multilevel models.)

In practice the difference between the two models is that fm2 is a
restricted form of fm1/fm3 in which the correlation of the random
effects has been set to zero.

> ?????
> Random effects:
> Groups Name Variance Std.Dev. Corr
> Subject (Intercept) 610.8 24.72
>  Days 35.1 5.92 0.067
> Residual 655.1 25.59
>
>
>
> _____________________________
> Professor Michael Kubovy
> University of Virginia
> Department of Psychology
> USPS:    P.O.Box 400400    Charlottesville, VA 22904-4400
> Parcels:    Room 102        Gilmer Hall
>        McCormick Road    Charlottesville, VA 22903
> Office:    B011    +1-434-982-4729
> Lab:        B019    +1-434-982-4751
> Fax:        +1-434-982-4766
> WWW:    http://www.people.virginia.edu/~mk9y/
>
>

------------------------------

Message: 4
Date: Tue, 08 Apr 2008 09:36:45 +1000
From: Simon Blomberg <s.blomberg1 at uq.edu.au>
Subject: Re: [R-sig-ME] Fwd: same old question - lme4 and p-values
To: David Henderson <dnadave at revolution-computing.com>
Cc: R Mixed Models <r-sig-mixed-models at r-project.org>,    Martin Maechler
    <maechler at stat.math.ethz.ch>
Message-ID: <1207611405.23040.3.camel at sib-sblomber01d.sib.uq.edu.au>
Content-Type: text/plain; charset=utf-8

On Sun, 2008-04-06 at 19:05 -0700, David Henderson wrote:
> Hi John:
> 
> > For all practical purposes, a CI is just the Bayesian credible  
> > interval that one gets with some suitable "non-informative prior".  
> > Why not then be specific about the prior, and go with the Bayesian  
> > credible interval?  (There is an issue whether such a prior can  
> > always be found.  Am right in judging this no practical consequence?)
> 
> 
> What?  Could you explain this a little more?  There is nothing  
> Bayesian about a classical (i.e. not Bayesian credible set or highest  
> posterior density, or whatever terminology you prefer) CI.  The  
> interpretation is completely different, and the assumptions used in  
> deriving the interval are also different.  Even though the interval  
> created when using a noninformative prior is similar to a classical  
> CI, they are not the same entity.
> 
> Now, while i agree with the arguments about p-values and their  
> validity, there is one aspect missing from this discussion.  When  
> creating a general use package like lme4, we are trying to create  
> software that enables statisticians and researchers to perform the  
> statistical analyses they need and interpret the results in ways that  
> HELP them get published. 

Well, that's only one reason for R's existence.

>  While I admire Doug for "drawing a line in  
> the sand" in regard to the use of p-values in published research, this  
> is counter to HELPING the researcher publish their results. 
>  There has  
> to be a better way to further your point in the community than FORCING  
> your point upon them.  Education of the next generation of researchers  
> and journal editors is admittedly slow, but a much more community  
> friendly way of getting your point used in practice.

?
If you don't like Doug's software, don't use it! Or since the code is
open source, hack it so it does what YOU want. Nobody is forcing
anything on you.

> 
> Just my $0.02...

Mine too. :-)

> 
> Dave H
> --
> David Henderson, Ph.D.
> Director of Community
> REvolution Computing
> 1100 Dexter Avenue North, Suite 250
> 206-577-4778 x3203
> DNADave at Revolution-Computing.Com
> http://www.revolution-computing.com
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat. 
Lecturer and Consultant Statistician 
Faculty of Biological and Chemical Sciences 
The University of Queensland 
St. Lucia Queensland 4072 
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
http://www.uq.edu.au/~uqsblomb
email: S.Blomberg1_at_uq.edu.au

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for 
an answer does not ensure that a reasonable answer can 
be extracted from a given body of data. - John Tukey.

------------------------------

Message: 5
Date: Tue, 08 Apr 2008 09:46:02 +1000
From: Simon Blomberg <s.blomberg1 at uq.edu.au>
Subject: Re: [R-sig-ME] Fwd: same old question - lme4 and p-values
To: John Maindonald <John.Maindonald at anu.edu.au>
Cc: R Mixed Models <r-sig-mixed-models at r-project.org>,    David Henderson
    <dnadave at revolution-computing.com>,    Martin Maechler
    <maechler at stat.math.ethz.ch>
Message-ID: <1207611962.23040.9.camel at sib-sblomber01d.sib.uq.edu.au>
Content-Type: text/plain

On Mon, 2008-04-07 at 20:47 +1000, John Maindonald wrote:
[ snip ]
> 
> Douglas's mcmcsamp() has advanced the state of the art
> for multi-level models, offering an approach that had not
> previously been readily available.  It is anyone's guess
> where it, and statistics and graphs that it makes readily
> possible, will in the course of time fit among styles of
> presentation that application area people find helpful.

Well, it's been possible to easily implement multi-level models in BUGS
using MCMC for a long time. Would you agree that BUGS is readily
available? :-) Doug has made it more convenient for R users, but I'm not
sure it has necessarily advanced the state of the art. Maybe brought R
up to speed (but ahead of other software which tends to start with the
letter S).

Simon.

> 
> John Maindonald            email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> 
> 
> On 7 Apr 2008, at 12:05 PM, David Henderson wrote:
> 
> > Hi John:
> >
> >> For all practical purposes, a CI is just the Bayesian credible  
> >> interval that one gets with some suitable "non-informative prior".  
> >> Why not then be specific about the prior, and go with the Bayesian  
> >> credible interval?  (There is an issue whether such a prior can  
> >> always be found.  Am right in judging this no practical consequence?)
> >
> >
> > What?  Could you explain this a little more?  There is nothing  
> > Bayesian about a classical (i.e. not Bayesian credible set or  
> > highest posterior density, or whatever terminology you prefer) CI.  
> > The interpretation is completely different, and the assumptions used  
> > in deriving the interval are also different.  Even though the  
> > interval created when using a noninformative prior is similar to a  
> > classical CI, they are not the same entity.
> >
> > Now, while i agree with the arguments about p-values and their  
> > validity, there is one aspect missing from this discussion.  When  
> > creating a general use package like lme4, we are trying to create  
> > software that enables statisticians and researchers to perform the  
> > statistical analyses they need and interpret the results in ways  
> > that HELP them get published.  While I admire Doug for "drawing a  
> > line in the sand" in regard to the use of p-values in published  
> > research, this is counter to HELPING the researcher publish their  
> > results.  There has to be a better way to further your point in the  
> > community than FORCING your point upon them.  Education of the next  
> > generation of researchers and journal editors is admittedly slow,  
> > but a much more community friendly way of getting your point used in  
> > practice.
> >
> > Just my $0.02...
> >
> > Dave H
> > --
> > David Henderson, Ph.D.
> > Director of Community
> > REvolution Computing
> > 1100 Dexter Avenue North, Suite 250
> > 206-577-4778 x3203
> > DNADave at Revolution-Computing.Com
> > http://www.revolution-computing.com
> >
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat. 
Lecturer and Consultant Statistician 
Faculty of Biological and Chemical Sciences 
The University of Queensland 
St. Lucia Queensland 4072 
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
http://www.uq.edu.au/~uqsblomb
email: S.Blomberg1_at_uq.edu.au

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for 
an answer does not ensure that a reasonable answer can 
be extracted from a given body of data. - John Tukey.

------------------------------

Message: 6
Date: Tue, 8 Apr 2008 10:25:13 +1000
From: John Maindonald <John.Maindonald at anu.edu.au>
Subject: Re: [R-sig-ME] Fwd: same old question - lme4 and p-values
To: Simon Blomberg <s.blomberg1 at uq.edu.au>
Cc: R Mixed Models <r-sig-mixed-models at r-project.org>,    David Henderson
    <dnadave at revolution-computing.com>,    Martin Maechler
    <maechler at stat.math.ethz.ch>
Message-ID: <16B9D71F-3074-466C-96CB-FCF611B79C6E at anu.edu.au>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Well, I may have been a bit carried away!

BUGS is though a bit different, surely. Estimation is done from
the beginning in a Bayesian framework.  It had not occurred to
me. till mcmcsamp() came along, that one could do use classical
estimates, and then graft an MCMC calculation on the end to
get posterior density estimates.  Purists may think this hybrid
approach not quite kosher. I'd expect that it would be problematic
if a highly informative prior was used in the MCMC calculation
(is that correct?).

Note however that a prior is chosen that makes the calculation
relatively straightforward.

I presume this hybrid approach is a lot less expensive,
computationally, than Bayesian MCMC estimation of parameters
as well as posterior densities?

John Maindonald            email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 8 Apr 2008, at 9:46 AM, Simon Blomberg wrote:

> On Mon, 2008-04-07 at 20:47 +1000, John Maindonald wrote:
> [ snip ]
>>
>> Douglas's mcmcsamp() has advanced the state of the art
>> for multi-level models, offering an approach that had not
>> previously been readily available.  It is anyone's guess
>> where it, and statistics and graphs that it makes readily
>> possible, will in the course of time fit among styles of
>> presentation that application area people find helpful.
>
> Well, it's been possible to easily implement multi-level models in  
> BUGS
> using MCMC for a long time. Would you agree that BUGS is readily
> available? :-) Doug has made it more convenient for R users, but I'm  
> not
> sure it has necessarily advanced the state of the art. Maybe brought R
> up to speed (but ahead of other software which tends to start with the
> letter S).
>
> Simon.
>
>>
>> John Maindonald            email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>> Centre for Mathematics & Its Applications, Room 1194,
>> John Dedman Mathematical Sciences Building (Building 27)
>> Australian National University, Canberra ACT 0200.
>>
>>
>> On 7 Apr 2008, at 12:05 PM, David Henderson wrote:
>>
>>> Hi John:
>>>
>>>> For all practical purposes, a CI is just the Bayesian credible
>>>> interval that one gets with some suitable "non-informative prior".
>>>> Why not then be specific about the prior, and go with the Bayesian
>>>> credible interval?  (There is an issue whether such a prior can
>>>> always be found.  Am right in judging this no practical  
>>>> consequence?)
>>>
>>>
>>> What?  Could you explain this a little more?  There is nothing
>>> Bayesian about a classical (i.e. not Bayesian credible set or
>>> highest posterior density, or whatever terminology you prefer) CI.
>>> The interpretation is completely different, and the assumptions used
>>> in deriving the interval are also different.  Even though the
>>> interval created when using a noninformative prior is similar to a
>>> classical CI, they are not the same entity.
>>>
>>> Now, while i agree with the arguments about p-values and their
>>> validity, there is one aspect missing from this discussion.  When
>>> creating a general use package like lme4, we are trying to create
>>> software that enables statisticians and researchers to perform the
>>> statistical analyses they need and interpret the results in ways
>>> that HELP them get published.  While I admire Doug for "drawing a
>>> line in the sand" in regard to the use of p-values in published
>>> research, this is counter to HELPING the researcher publish their
>>> results.  There has to be a better way to further your point in the
>>> community than FORCING your point upon them.  Education of the next
>>> generation of researchers and journal editors is admittedly slow,
>>> but a much more community friendly way of getting your point used in
>>> practice.
>>>
>>> Just my $0.02...
>>>
>>> Dave H
>>> --
>>> David Henderson, Ph.D.
>>> Director of Community
>>> REvolution Computing
>>> 1100 Dexter Avenue North, Suite 250
>>> 206-577-4778 x3203
>>> DNADave at Revolution-Computing.Com
>>> http://www.revolution-computing.com
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> -- 
> Simon Blomberg, BSc (Hons), PhD, MAppStat.
> Lecturer and Consultant Statistician
> Faculty of Biological and Chemical Sciences
> The University of Queensland
> St. Lucia Queensland 4072
> Australia
> Room 320 Goddard Building (8)
> T: +61 7 3365 2506
> http://www.uq.edu.au/~uqsblomb
> email: S.Blomberg1_at_uq.edu.au
>
> Policies:
> 1.  I will NOT analyse your data for you.
> 2.  Your deadline is your problem.
>
> The combination of some data and an aching desire for
> an answer does not ensure that a reasonable answer can
> be extracted from a given body of data. - John Tukey.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

------------------------------

_______________________________________________
R-sig-mixed-models mailing list
R-sig-mixed-models at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

End of R-sig-mixed-models Digest, Vol 16, Issue 21
**************************************************

      ___________________________________________________________ 
Yahoo! For Good helps you make a difference  

http://uk.promotions.yahoo.com/forgood/