[R] Hmisc summarize() with level "" in by variable
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Sat Jun 13 15:46:36 CEST 2009
Sorry about the bug, which is now fixed. You can get the fix by entering
source('http://biostat.mc.vanderbilt.edu/cgi-bin/viewvc.cgi/*checkout*/Hmisc/trunk/R/summary.formula.s?rev=661')
until we update the package.
Frank
Michael Erickson wrote:
> I was using summarize() in a data set in which one of the levels of
> the by variable was "". The summary statistic was consistently off by
> one level and the "" level was not in the output data frame. I tried
> to report it as a bug, but I could not log into the Hmisc bug
> reporting website to do so. I searched for this in the email
> archives. If it's there, I failed to find it. Should I try to pursue
> this as a bug, or am I using summarize incorrectly? Here is my
> example along with the output:
>
>> tst1 <- data.frame(a=factor(c("", "A", "B", "C")),
> + x=1:4)
>> tst1
> a x
> 1 1
> 2 A 2
> 3 B 3
> 4 C 4
>> with(tst1, summarize(x, by=llist(a), FUN=mean))
> a x
> 1 A 1
> 2 B 2
> 3 C 3
>> with(tst1, aggregate(x, by=list(a), FUN=mean))
> Group.1 x
> 1 1
> 2 A 2
> 3 B 3
> 4 C 4
>
>> sessionInfo()
> R version 2.9.0 (2009-04-17)
> i486-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Hmisc_3.6-0
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.13 grid_2.9.0 lattice_0.17-22
>
>
> Michael
>
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list