[R] help with subset(), still original dataframe in tapply

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Mon May 26 20:53:44 CEST 2003


Frank Mattes <f.mattes at rfc.ucl.ac.uk> writes:

> I'm creating now a subset missing the values 0 and "NA"
> >  newex<-subset(ex,ex$REL>0)
> >  newex
>          UID   REL
> 5  R1.B8.38 0.010
> 6  R1.B8.38 0.060
> 7  R1.B8.38 0.006
> 8  R1.B8.38 0.010
> 9  R1.B8.48 0.080
> 11 R1.B8.48 0.006
> 
> and now would like to apply the mean to each group in (UID)
> 
> >  tapply(newex$REL,newex$UID,mean,rm.na=T)
> R1.B8.31 R1.B8.38 R1.B8.48
>        NA   0.0215   0.0430
> 
> to my surprise, I still have the mean for group R1.B8.31, which has
> been removed by the subset function before.

A subset of a three-level factor is still a three-level factor. If you
want it to become a factor with only those levels that are present in
data, you need to say so, e.g. with

tapply(newex$REL,factor(newex$UID),mean)
 
> but I would like to know why the tapply still uses the original dataframe.

It doesn't.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list