[R] Summary using by() returns character arrays in a list
Alex van der Spek
doorz at xs4all.nl
Wed Oct 10 16:02:53 CEST 2012
Thank you Petr,
Try this
str(by(iris, iris$Species, summary))
and you will see what is actually returned is a list of 3, each element
containing a character table, not a numeric table. The rownames of these
tables are empty but should contain the names of the summary stats.
I have a workaround now. Modified the summary.data.frame method to output
numeric values and not the character strings. The rownames I set
afterwards in a for loop. Still would like to know how to do this internal
to summary.data.frame though.
Regards,
Alex van der Spek
> Hi
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Alex van der Spek
>> Sent: Wednesday, October 10, 2012 2:48 PM
>> To: r-help at r-project.org
>> Subject: [R] Summary using by() returns character arrays in a list
>>
>> I use by() to generate a summary statistics like so:
>>
>> Lbys <- by(dat[Nidx], dat$LipTest, summary)
>>
>> where Nidx is an index vector with names picking out the columns in the
>> data frame dat.
>>
>> This returns a list of character arrays (see below for str() output)
>> where the columns are named correctly but the rownames are empty
>> strings and the values are strings prepended with the summary
>> statistic's name (e.g.
>> "Min.", "Median ").
>
> Without knowledge of your data it is difficult to understand what is
> wrong.
>
> If I use iris data set as input everything goes as expected
> data(iris)
>> summary(iris)
> Sepal.Length Sepal.Width Petal.Length Petal.Width
> Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
> 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
> Median :5.800 Median :3.000 Median :4.350 Median :1.300
> Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
> 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
> Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
> Species
> setosa :50
> versicolor:50
> virginica :50
>
>
>
>> by(iris, iris$Species, summary)
> iris$Species: setosa
> Sepal.Length Sepal.Width Petal.Length Petal.Width
> Min. :4.300 Min. :2.300 Min. :1.000 Min. :0.100
> 1st Qu.:4.800 1st Qu.:3.200 1st Qu.:1.400 1st Qu.:0.200
> Median :5.000 Median :3.400 Median :1.500 Median :0.200
> Mean :5.006 Mean :3.428 Mean :1.462 Mean :0.246
> 3rd Qu.:5.200 3rd Qu.:3.675 3rd Qu.:1.575 3rd Qu.:0.300
> Max. :5.800 Max. :4.400 Max. :1.900 Max. :0.600
> Species
> setosa :50
> versicolor: 0
> virginica : 0
>
>
>>
>> I am reading the code of summary.data.frame() but can't figure out how
>> I can change the action of that function to return list of numeric
>> matrices with as rownames the summary statistic's name ("Min.", "Max."
>> etc) and as values the numeric values of the calculated summary
>> statistic.
>
> Just what do you not like on such output and how do you want the output
> structured?
> Maybe you want aggregate, but without simple data it is hard to say.
>
> aggregate(iris[1:2], list(iris$Species), summary)
>
> Regards
> Petr
>
>>
>> Any help much appreciated!
>> Regards,
>> Alex van der Spek
>>
>>
>> > str(Lbys)
>> List of 2
>> $ : 'table' chr [1:6, 1:19] "Min. :-0.190 " "1st Qu.: 9.297 "
>> "Median :10.373 " "Mean :10.100 " ...
>> ..- attr(*, "dimnames")=List of 2
>> .. ..$ : chr [1:6] "" "" "" "" ...
>> .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
>> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>> $ T38: 'table' chr [1:6, 1:19] "Min. :8.648 " "1st Qu.:8.920 "
>> "Median :9.018 " "Mean :9.027 " ...
>> ..- attr(*, "dimnames")=List of 2
>> .. ..$ : chr [1:6] "" "" "" "" ...
>> .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
>> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>> - attr(*, "dim")= int 2
>> - attr(*, "dimnames")=List of 1
>> ..$ dat$LipTest: chr [1:2] "" "T38"
>> - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES =
>> dat$LipTest, FUN = summary)
>> - attr(*, "class")= chr "by"
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list