[R] Imputing missing values using "LSmeans" (i.e., population marginal means) - advice in R?

Liaw, Andy andy_liaw at merck.com
Thu Apr 5 17:40:04 CEST 2012


Don't know how you searched, but perhaps this might help:

https://stat.ethz.ch/pipermail/r-help/2007-March/128064.html 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Jenn Barrett
> Sent: Tuesday, April 03, 2012 1:23 AM
> To: r-help at r-project.org
> Subject: [R] Imputing missing values using "LSmeans" (i.e., 
> population marginal means) - advice in R?
> 
> Hi folks,
> 
> I have a dataset that consists of counts over a ~30 year 
> period at multiple (>200) sites. Only one count is conducted 
> at each site in each year; however, not all sites are 
> surveyed in all years. I need to impute the missing values 
> because I need an estimate of the total population size 
> (i.e., sum of counts across all sites) in each year as input 
> to another model. 
> 
> > head(newdat,40)
>    SITE YEAR COUNT
> 1     1 1975 12620
> 2     1 1976 13499
> 3     1 1977 45575
> 4     1 1978 21919
> 5     1 1979 33423
> ...
> 37    2 1975 40000
> 38    2 1978 40322
> 39    2 1979 70000
> 40    2 1980 16244
> 
> 
> It was suggested to me by a statistician to use LSmeans to do 
> this; however, I do not have SAS, nor do I know anything much 
> about SAS. I have spent DAYS reading about these "LSmeans" 
> and while (I think) I understand what they are, I have 
> absolutely no idea how to a) calculate them in R and b) how 
> to use them to impute my missing values in R. Again, I've 
> searched the mail lists, internet and literature and have not 
> found any documentation to advise on how to do this - I'm lost.
> 
> I've looked at popMeans, but have no clue how to use this 
> with predict() - if this is even the route to go. Any advice 
> would be much appreciated. Note that YEAR will be treated as 
> a factor and not a linear variable (i.e., the relationship 
> between COUNT and YEAR is not linear - rather there are highs 
> and lows about every 10 or so years).
> 
> One thought I did have was to just set up a loop to calculate 
> the least-squares estimates as:
> 
> Yij = (IYi + JYj - Y)/[(I-1)(J-1)]
> where  I = number of treatments and J = number of blocks (so 
> I = sites and J = years). I found this formula in some stats 
> lecture handouts by UC Davis on unbalanced data and 
> LSMeans...but does it yield the same thing as using the 
> LSmeans estimates? Does it make any sense? Thoughts?
> 
> Many thanks in advance.
> 
> Jenn
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list