[R] tapply() and using factor() on a factor
William Dunlap
wdunlap at tibco.com
Fri Oct 16 05:59:05 CEST 2009
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Alexander
> Peterhansl
> Sent: Thursday, October 15, 2009 2:50 PM
> To: r-help at r-project.org
> Subject: [R] tapply() and using factor() on a factor
>
> Dear List,
>
>
>
> Shouldn't result1 and result2 be equal in the following case?
>
>
>
> Note that log$RequestID is a factor. That is,
> is.factor(log$RequestID)
> yields TRUE.
>
>
>
> result1 <- tapply(log$Flag,factor(log$RequestID),sum)
>
> result2 <- tapply(log$Flag,log$RequestID,sum)
Showing us the output of dput(log) (or str(log) and summary(log))
would let people discover the problem more readily. Since you
didn't I'll guess what the dataset may contain.
If log$RequestID is a factor with lots of unused levels tapply
will output an NA for each unused level. factor(log$RequestID)
will create a new set of levels, only those actually used,
so tapply will not be forced to fill those spots with NA's. E.g.,
> log<-data.frame(Flag=1:2, RequestID=factor(letters[1:2],
levels=letters[1:10]))
> tapply(log$Flag, log$RequestID, sum)
a b c d e f g h i j
1 2 NA NA NA NA NA NA NA NA
> tapply(log$Flag, factor(log$RequestID), sum)
a b
1 2
I suppose tapply(X,INDEX,FUN) could call FUN(X[0]) to see
how to fill the cells with no data behind them, but it doesn't.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
>
>
> Yet, when I summarize the output, I get the following:
>
> summary(result1)
>
> Min. 1st Qu. Median Mean 3rd Qu. Max.
>
> 11.00 11.00 11.00 26.06 11.00 101.00
>
>
>
> summary(result2)
>
> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
>
> 11.00 11.00 11.00 26.06 11.00 101.00 978.00
>
>
>
> Why does result2 have 978 NA's?
>
>
>
> Any help on this would be appreciated.
>
>
>
> Alex
>
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list