[R-sig-ME] Stats question

Thu Jul 31 06:46:06 CEST 2008

Dear friends, 
I am not sure that this is the right place to ask,  but please feel free to suggest an alternative discussion group.
My question is that I want to do a comparative study in order to compare the rate of incidence in two populations. I know that a pilot study was conducted a few weeks ago and found 8/140 (around 6%) incidence in population A. Population B was not sampled. Assuming this is (about) the right proportion in the Population A what is the sample size I need for population A and B in the main study, in order to have power of 80% to idenitfy significant differences? I would expect the incidence in population B to be around 10% compared to the 6% of the Population A.
Any suggestions?
Jason
Dr. Iasonas Lamprianou
Department of Education
The University of Manchester
Oxford Road, Manchester M13 9PL, UK
Tel. 0044 161 275 3485
iasonas.lamprianou at manchester.ac.uk

----- Original Message ----
From: "r-sig-mixed-models-request at r-project.org" <r-sig-mixed-models-request at r-project.org>
To: r-sig-mixed-models at r-project.org
Sent: Tuesday, 29 July, 2008 1:00:01 PM
Subject: R-sig-mixed-models Digest, Vol 19, Issue 23

Send R-sig-mixed-models mailing list submissions to
    r-sig-mixed-models at r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
or, via email, send a message with subject or body 'help' to
    r-sig-mixed-models-request at r-project.org

You can reach the person managing the list at
    r-sig-mixed-models-owner at r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-sig-mixed-models digest..."

Today's Topics:

  1. Re: missing data in lme, lmer, PROC MIXED (M Henry H Stevens)
  2. Re: missing data in lme, lmer, PROC MIXED (Ken Beath)
  3. Re: missing data in lme, lmer, PROC MIXED (Doran, Harold)

----------------------------------------------------------------------

Message: 1
Date: Mon, 28 Jul 2008 07:04:19 -0400
From: M Henry H Stevens <HStevens at muohio.edu>
Subject: Re: [R-sig-ME] missing data in lme, lmer, PROC MIXED
To: Ken Beath <kjbeath at kagi.com>
Cc: R Mixed Models <r-sig-mixed-models at r-project.org>, "Stevens,
    Martin Henry H. Dr." <stevenmh at muohio.edu>
Message-ID: <1217243059.5947.33.camel at stevenmh-desktop>
Content-Type: text/plain; charset=UTF-8

Thanks Ken. I have been assuming that they meant missing covariates (a
subject provided most of the predictors, but not all). So I take it that
SAS does no imputation on its own-that the user would need to do that
(if they wanted?). lme does not do anything like that.

Hank

On Sat, 2008-07-26 at 22:39 -0400, Ken Beath wrote:
> On 26/07/2008, at 7:28 AM, M Henry H Stevens wrote:
> 
> > Hi folks,
> > I have colleagues who comfortably state that "missing data" are ok in
> > "mixed models" - because "the program (PROC MIXED) handles missing
> > data
> > -- I have a hard time imagining what it does.
> >
> > To those of you who use both R and SAS, I was wondering if you might
> > share insight into what these do.
> >
> > As far as I know, for lme:
> > 'na.action="na.omit" ' or na.exclude, removes the rows with any
> > missing
> > data.
> >
> 
> This depends. If the missing data is the dependent and it is missing
> at random then as mixed models are fitted using maximum likelihood it
> will produce results that are optimal. Roughly (there are some really
> technical definitions for missing data and I haven't checked them) if
> we don't know the outcome and the reason it is missing isn't due to
> its value or the other data then we can simply leave it out of the
> likelihood equation it as it has no useful information. A problem is
> when data being missing provides this sort of information and is very
> difficult to model. An example is if observations above a certain
> value are more likely to be missing.
> 
> An alternative method of dealing with repeated data is to produce a
> summary for each subject or cluster, for example by averaging the last
> three visits. This doesn't correctly handle missing data although the
> loss in efficiency is usually small and it can work well, provided
> only a small proportion is missing.
> 
> What R and SAS don't deal with directly is missing data in the
> covariates. This takes a bit more work, for example using multiple
> imputation. Here the complete case method where an observation with
> any missing data is removed will result in a loss of efficiency
> compared to what can be achieved.
> 
> Ken
-- 
?
Dr. Hank Stevens, Associate Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056

Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/~stevenmh/
http://www.cas.muohio.edu/ecology
http://www.muohio.edu/botany/

"If the stars should appear one night in a thousand years, how would men
believe and adore." -Ralph Waldo Emerson, writer and philosopher
(1803-1882)

------------------------------

Message: 2
Date: Mon, 28 Jul 2008 22:21:55 +1000
From: Ken Beath <kjbeath at kagi.com>
Subject: Re: [R-sig-ME] missing data in lme, lmer, PROC MIXED
To: MHH Stevens <HStevens at muohio.edu>
Cc: R Mixed Models <r-sig-mixed-models at r-project.org>, "Stevens,
    Martin Henry H. Dr." <stevenmh at muohio.edu>
Message-ID: <68CB3A51-FA7C-4758-AE35-2DC68C077E8B at kagi.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

On 28/07/2008, at 9:04 PM, M Henry H Stevens wrote:

> Thanks Ken. I have been assuming that they meant missing covariates (a
> subject provided most of the predictors, but not all). So I take it  
> that
> SAS does no imputation on its own-that the user would need to do that
> (if they wanted?). lme does not do anything like that.
>

Yes, neither SAS or R or most programs handle missing covariates  
automatically. The only program I know is MPlus which is a general  
latent variable modelling program. I turned off the missing data  
handling as for one model it resulted in an 11 dimensional integration.

Ken

> Hank
>
> On Sat, 2008-07-26 at 22:39 -0400, Ken Beath wrote:
>> On 26/07/2008, at 7:28 AM, M Henry H Stevens wrote:
>>
>>> Hi folks,
>>> I have colleagues who comfortably state that "missing data" are ok  
>>> in
>>> "mixed models" - because "the program (PROC MIXED) handles missing
>>> data
>>> -- I have a hard time imagining what it does.
>>>
>>> To those of you who use both R and SAS, I was wondering if you might
>>> share insight into what these do.
>>>
>>> As far as I know, for lme:
>>> 'na.action="na.omit" ' or na.exclude, removes the rows with any
>>> missing
>>> data.
>>>
>>
>> This depends. If the missing data is the dependent and it is missing
>> at random then as mixed models are fitted using maximum likelihood it
>> will produce results that are optimal. Roughly (there are some really
>> technical definitions for missing data and I haven't checked them) if
>> we don't know the outcome and the reason it is missing isn't due to
>> its value or the other data then we can simply leave it out of the
>> likelihood equation it as it has no useful information. A problem is
>> when data being missing provides this sort of information and is very
>> difficult to model. An example is if observations above a certain
>> value are more likely to be missing.
>>
>> An alternative method of dealing with repeated data is to produce a
>> summary for each subject or cluster, for example by averaging the  
>> last
>> three visits. This doesn't correctly handle missing data although the
>> loss in efficiency is usually small and it can work well, provided
>> only a small proportion is missing.
>>
>> What R and SAS don't deal with directly is missing data in the
>> covariates. This takes a bit more work, for example using multiple
>> imputation. Here the complete case method where an observation with
>> any missing data is removed will result in a loss of efficiency
>> compared to what can be achieved.
>>
>> Ken
> -- 
>
> Dr. Hank Stevens, Associate Professor
> 338 Pearson Hall
> Botany Department
> Miami University
> Oxford, OH 45056
>
> Office: (513) 529-4206
> Lab: (513) 529-4262
> FAX: (513) 529-4243
> http://www.cas.muohio.edu/~stevenmh/
> http://www.cas.muohio.edu/ecology
> http://www.muohio.edu/botany/
>
> "If the stars should appear one night in a thousand years, how would  
> men
> believe and adore." -Ralph Waldo Emerson, writer and philosopher
> (1803-1882)
>
>
>
>
>

------------------------------

Message: 3
Date: Mon, 28 Jul 2008 09:05:32 -0400
From: "Doran, Harold" <HDoran at air.org>
Subject: Re: [R-sig-ME] missing data in lme, lmer, PROC MIXED
To: "Ken Beath" <kjbeath at kagi.com>, "MHH Stevens"
    <HStevens at muohio.edu>
Cc: R Mixed Models <r-sig-mixed-models at r-project.org>, "Stevens,
    Martin Henry H. Dr." <stevenmh at muohio.edu>
Message-ID: <ED7B522EE00C9A4FA515AA71724D61EEB86738 at DC1EXCL01.air.org>
Content-Type: text/plain;    charset="us-ascii"

Ken,

Does M-Plus actually impute values for the missing cells in the model
matrix for the fixed effects? Is this a default behavior of m-plus, or
does one need to be cognizant of this and implement a particular
imputation strategy?

In general, this kind of question comes up all the time on the
multilevel listserv. There are constant suggestions that many of the
multilevel software packages automagically "handle" missing data because
they use "maximum likelihood". 

> -----Original Message-----
> From: r-sig-mixed-models-bounces at r-project.org 
> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf 
> Of Ken Beath
> Sent: Monday, July 28, 2008 8:22 AM
> To: MHH Stevens
> Cc: R Mixed Models; Stevens,Martin Henry H. Dr.
> Subject: Re: [R-sig-ME] missing data in lme, lmer, PROC MIXED
> 
> On 28/07/2008, at 9:04 PM, M Henry H Stevens wrote:
> 
> > Thanks Ken. I have been assuming that they meant missing 
> covariates (a 
> > subject provided most of the predictors, but not all). So I take it 
> > that SAS does no imputation on its own-that the user would 
> need to do 
> > that (if they wanted?). lme does not do anything like that.
> >
> 
> Yes, neither SAS or R or most programs handle missing covariates  
> automatically. The only program I know is MPlus which is a general  
> latent variable modelling program. I turned off the missing data  
> handling as for one model it resulted in an 11 dimensional 
> integration.
> 
> Ken
> 
> > Hank
> >
> > On Sat, 2008-07-26 at 22:39 -0400, Ken Beath wrote:
> >> On 26/07/2008, at 7:28 AM, M Henry H Stevens wrote:
> >>
> >>> Hi folks,
> >>> I have colleagues who comfortably state that "missing 
> data" are ok  
> >>> in
> >>> "mixed models" - because "the program (PROC MIXED) handles missing
> >>> data
> >>> -- I have a hard time imagining what it does.
> >>>
> >>> To those of you who use both R and SAS, I was wondering 
> if you might
> >>> share insight into what these do.
> >>>
> >>> As far as I know, for lme:
> >>> 'na.action="na.omit" ' or na.exclude, removes the rows with any
> >>> missing
> >>> data.
> >>>
> >>
> >> This depends. If the missing data is the dependent and it 
> is missing
> >> at random then as mixed models are fitted using maximum 
> likelihood it
> >> will produce results that are optimal. Roughly (there are 
> some really
> >> technical definitions for missing data and I haven't 
> checked them) if
> >> we don't know the outcome and the reason it is missing isn't due to
> >> its value or the other data then we can simply leave it out of the
> >> likelihood equation it as it has no useful information. A 
> problem is
> >> when data being missing provides this sort of information 
> and is very
> >> difficult to model. An example is if observations above a certain
> >> value are more likely to be missing.
> >>
> >> An alternative method of dealing with repeated data is to produce a
> >> summary for each subject or cluster, for example by averaging the  
> >> last
> >> three visits. This doesn't correctly handle missing data 
> although the
> >> loss in efficiency is usually small and it can work well, provided
> >> only a small proportion is missing.
> >>
> >> What R and SAS don't deal with directly is missing data in the
> >> covariates. This takes a bit more work, for example using multiple
> >> imputation. Here the complete case method where an observation with
> >> any missing data is removed will result in a loss of efficiency
> >> compared to what can be achieved.
> >>
> >> Ken
> > -- 
> >
> > Dr. Hank Stevens, Associate Professor
> > 338 Pearson Hall
> > Botany Department
> > Miami University
> > Oxford, OH 45056
> >
> > Office: (513) 529-4206
> > Lab: (513) 529-4262
> > FAX: (513) 529-4243
> > http://www.cas.muohio.edu/~stevenmh/
> > http://www.cas.muohio.edu/ecology
> > http://www.muohio.edu/botany/
> >
> > "If the stars should appear one night in a thousand years, 
> how would  
> > men
> > believe and adore." -Ralph Waldo Emerson, writer and philosopher
> > (1803-1882)
> >
> >
> >
> >
> >
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

------------------------------

_______________________________________________
R-sig-mixed-models mailing list
R-sig-mixed-models at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

End of R-sig-mixed-models Digest, Vol 19, Issue 23
**************************************************

      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html