[R] Summary using by() returns character arrays in a list
Rui Barradas
ruipbarradas at sapo.pt
Wed Oct 10 17:41:01 CEST 2012
Hello,
If 'by' is giving you trouble, why not 'aggregate'?
agg.df <- aggregate(iris, list(iris$Species), FUN = summary)
str(agg.df)
Hope this helps,
Rui Barradas
Em 10-10-2012 15:02, Alex van der Spek escreveu:
> Thank you Petr,
>
> Try this
>
> str(by(iris, iris$Species, summary))
>
> and you will see what is actually returned is a list of 3, each element
> containing a character table, not a numeric table. The rownames of these
> tables are empty but should contain the names of the summary stats.
>
> I have a workaround now. Modified the summary.data.frame method to output
> numeric values and not the character strings. The rownames I set
> afterwards in a for loop. Still would like to know how to do this internal
> to summary.data.frame though.
>
> Regards,
> Alex van der Spek
>
>> Hi
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of Alex van der Spek
>>> Sent: Wednesday, October 10, 2012 2:48 PM
>>> To: r-help at r-project.org
>>> Subject: [R] Summary using by() returns character arrays in a list
>>>
>>> I use by() to generate a summary statistics like so:
>>>
>>> Lbys <- by(dat[Nidx], dat$LipTest, summary)
>>>
>>> where Nidx is an index vector with names picking out the columns in the
>>> data frame dat.
>>>
>>> This returns a list of character arrays (see below for str() output)
>>> where the columns are named correctly but the rownames are empty
>>> strings and the values are strings prepended with the summary
>>> statistic's name (e.g.
>>> "Min.", "Median ").
>> Without knowledge of your data it is difficult to understand what is
>> wrong.
>>
>> If I use iris data set as input everything goes as expected
>> data(iris)
>>> summary(iris)
>> Sepal.Length Sepal.Width Petal.Length Petal.Width
>> Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
>> 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
>> Median :5.800 Median :3.000 Median :4.350 Median :1.300
>> Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
>> 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
>> Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
>> Species
>> setosa :50
>> versicolor:50
>> virginica :50
>>
>>
>>
>>> by(iris, iris$Species, summary)
>> iris$Species: setosa
>> Sepal.Length Sepal.Width Petal.Length Petal.Width
>> Min. :4.300 Min. :2.300 Min. :1.000 Min. :0.100
>> 1st Qu.:4.800 1st Qu.:3.200 1st Qu.:1.400 1st Qu.:0.200
>> Median :5.000 Median :3.400 Median :1.500 Median :0.200
>> Mean :5.006 Mean :3.428 Mean :1.462 Mean :0.246
>> 3rd Qu.:5.200 3rd Qu.:3.675 3rd Qu.:1.575 3rd Qu.:0.300
>> Max. :5.800 Max. :4.400 Max. :1.900 Max. :0.600
>> Species
>> setosa :50
>> versicolor: 0
>> virginica : 0
>>
>>
>>> I am reading the code of summary.data.frame() but can't figure out how
>>> I can change the action of that function to return list of numeric
>>> matrices with as rownames the summary statistic's name ("Min.", "Max."
>>> etc) and as values the numeric values of the calculated summary
>>> statistic.
>> Just what do you not like on such output and how do you want the output
>> structured?
>> Maybe you want aggregate, but without simple data it is hard to say.
>>
>> aggregate(iris[1:2], list(iris$Species), summary)
>>
>> Regards
>> Petr
>>
>>> Any help much appreciated!
>>> Regards,
>>> Alex van der Spek
>>>
>>>
>>>> str(Lbys)
>>> List of 2
>>> $ : 'table' chr [1:6, 1:19] "Min. :-0.190 " "1st Qu.: 9.297 "
>>> "Median :10.373 " "Mean :10.100 " ...
>>> ..- attr(*, "dimnames")=List of 2
>>> .. ..$ : chr [1:6] "" "" "" "" ...
>>> .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
>>> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>>> $ T38: 'table' chr [1:6, 1:19] "Min. :8.648 " "1st Qu.:8.920 "
>>> "Median :9.018 " "Mean :9.027 " ...
>>> ..- attr(*, "dimnames")=List of 2
>>> .. ..$ : chr [1:6] "" "" "" "" ...
>>> .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
>>> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>>> - attr(*, "dim")= int 2
>>> - attr(*, "dimnames")=List of 1
>>> ..$ dat$LipTest: chr [1:2] "" "T38"
>>> - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES =
>>> dat$LipTest, FUN = summary)
>>> - attr(*, "class")= chr "by"
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list