[R] Summary using by() returns character arrays in a list

Wed Oct 10 17:41:01 CEST 2012

Hello,

If 'by' is giving you trouble, why not 'aggregate'?

agg.df <- aggregate(iris, list(iris$Species), FUN = summary)
str(agg.df)

Hope this helps,

Rui Barradas
Em 10-10-2012 15:02, Alex van der Spek escreveu:
> Thank you Petr,
>
> Try this
>
> str(by(iris, iris$Species, summary))
>
> and you will see what is actually returned is a list of 3, each element
> containing a character table, not a numeric table. The rownames of these
> tables are empty but should contain the names of the summary stats.
>
> I have a workaround now. Modified the summary.data.frame method to output
> numeric values and not the character strings. The rownames I set
> afterwards in a for loop. Still would like to know how to do this internal
> to summary.data.frame though.
>
> Regards,
> Alex van der Spek
>
>> Hi
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of Alex van der Spek
>>> Sent: Wednesday, October 10, 2012 2:48 PM
>>> To: r-help at r-project.org
>>> Subject: [R] Summary using by() returns character arrays in a list
>>>
>>> I use by() to generate a summary statistics like so:
>>>
>>> Lbys <- by(dat[Nidx], dat$LipTest, summary)
>>>
>>> where Nidx is an index vector with names picking out the columns in the
>>> data frame dat.
>>>
>>> This returns a list of character arrays (see below for str() output)
>>> where the columns are named correctly but the rownames are empty
>>> strings and the values are strings prepended with the summary
>>> statistic's name (e.g.
>>> "Min.", "Median ").
>> Without knowledge of your data it is difficult to understand what is
>> wrong.
>>
>> If I use iris data set as input everything goes as expected
>> data(iris)
>>> summary(iris)
>>    Sepal.Length    Sepal.Width     Petal.Length    Petal.Width
>>   Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100
>>   1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300
>>   Median :5.800   Median :3.000   Median :4.350   Median :1.300
>>   Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199
>>   3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800
>>   Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500
>>         Species
>>   setosa    :50
>>   versicolor:50
>>   virginica :50
>>
>>
>>
>>> by(iris, iris$Species, summary)
>> iris$Species: setosa
>>    Sepal.Length    Sepal.Width     Petal.Length    Petal.Width
>>   Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100
>>   1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200
>>   Median :5.000   Median :3.400   Median :1.500   Median :0.200
>>   Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246
>>   3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300
>>   Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600
>>         Species
>>   setosa    :50
>>   versicolor: 0
>>   virginica : 0
>>
>>
>>> I am reading the code of summary.data.frame() but can't figure out how
>>> I can change the action of that function to return list of numeric
>>> matrices with as rownames the summary statistic's name ("Min.", "Max."
>>> etc) and as values the numeric values of the calculated summary
>>> statistic.
>> Just what do you not like on such output and how do you want the output
>> structured?
>> Maybe you want aggregate, but without simple data it is hard to say.
>>
>> aggregate(iris[1:2], list(iris$Species), summary)
>>
>> Regards
>> Petr
>>
>>> Any help much appreciated!
>>> Regards,
>>> Alex van der Spek
>>>
>>>
>>>> str(Lbys)
>>> List of 2
>>>   $    : 'table' chr [1:6, 1:19] "Min.   :-0.190  " "1st Qu.: 9.297  "
>>> "Median :10.373  " "Mean   :10.100  " ...
>>>    ..- attr(*, "dimnames")=List of 2
>>>    .. ..$ : chr [1:6] "" "" "" "" ...
>>>    .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
>>> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>>>   $ T38: 'table' chr [1:6, 1:19] "Min.   :8.648  " "1st Qu.:8.920  "
>>> "Median :9.018  " "Mean   :9.027  " ...
>>>    ..- attr(*, "dimnames")=List of 2
>>>    .. ..$ : chr [1:6] "" "" "" "" ...
>>>    .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
>>> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>>>   - attr(*, "dim")= int 2
>>>   - attr(*, "dimnames")=List of 1
>>>    ..$ dat$LipTest: chr [1:2] "" "T38"
>>>   - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES =
>>> dat$LipTest, FUN = summary)
>>>   - attr(*, "class")= chr "by"
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.