[R] Summary using by() returns character arrays in a list

Alex van der Spek doorz at xs4all.nl
Wed Oct 10 16:02:53 CEST 2012


Thank you Petr,

Try this

str(by(iris, iris$Species, summary))

and you will see what is actually returned is a list of 3, each element
containing a character table, not a numeric table. The rownames of these
tables are empty but should contain the names of the summary stats.

I have a workaround now. Modified the summary.data.frame method to output
numeric values and not the character strings. The rownames I set
afterwards in a for loop. Still would like to know how to do this internal
to summary.data.frame though.

Regards,
Alex van der Spek

> Hi
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Alex van der Spek
>> Sent: Wednesday, October 10, 2012 2:48 PM
>> To: r-help at r-project.org
>> Subject: [R] Summary using by() returns character arrays in a list
>>
>> I use by() to generate a summary statistics like so:
>>
>> Lbys <- by(dat[Nidx], dat$LipTest, summary)
>>
>> where Nidx is an index vector with names picking out the columns in the
>> data frame dat.
>>
>> This returns a list of character arrays (see below for str() output)
>> where the columns are named correctly but the rownames are empty
>> strings and the values are strings prepended with the summary
>> statistic's name (e.g.
>> "Min.", "Median ").
>
> Without knowledge of your data it is difficult to understand what is
> wrong.
>
> If I use iris data set as input everything goes as expected
> data(iris)
>> summary(iris)
>   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width
>  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100
>  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300
>  Median :5.800   Median :3.000   Median :4.350   Median :1.300
>  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199
>  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800
>  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500
>        Species
>  setosa    :50
>  versicolor:50
>  virginica :50
>
>
>
>> by(iris, iris$Species, summary)
> iris$Species: setosa
>   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width
>  Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100
>  1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200
>  Median :5.000   Median :3.400   Median :1.500   Median :0.200
>  Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246
>  3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300
>  Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600
>        Species
>  setosa    :50
>  versicolor: 0
>  virginica : 0
>
>
>>
>> I am reading the code of summary.data.frame() but can't figure out how
>> I can change the action of that function to return list of numeric
>> matrices with as rownames the summary statistic's name ("Min.", "Max."
>> etc) and as values the numeric values of the calculated summary
>> statistic.
>
> Just what do you not like on such output and how do you want the output
> structured?
> Maybe you want aggregate, but without simple data it is hard to say.
>
> aggregate(iris[1:2], list(iris$Species), summary)
>
> Regards
> Petr
>
>>
>> Any help much appreciated!
>> Regards,
>> Alex van der Spek
>>
>>
>> > str(Lbys)
>> List of 2
>>  $    : 'table' chr [1:6, 1:19] "Min.   :-0.190  " "1st Qu.: 9.297  "
>> "Median :10.373  " "Mean   :10.100  " ...
>>   ..- attr(*, "dimnames")=List of 2
>>   .. ..$ : chr [1:6] "" "" "" "" ...
>>   .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
>> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>>  $ T38: 'table' chr [1:6, 1:19] "Min.   :8.648  " "1st Qu.:8.920  "
>> "Median :9.018  " "Mean   :9.027  " ...
>>   ..- attr(*, "dimnames")=List of 2
>>   .. ..$ : chr [1:6] "" "" "" "" ...
>>   .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
>> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>>  - attr(*, "dim")= int 2
>>  - attr(*, "dimnames")=List of 1
>>   ..$ dat$LipTest: chr [1:2] "" "T38"
>>  - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES =
>> dat$LipTest, FUN = summary)
>>  - attr(*, "class")= chr "by"
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list