[R] Summary using by() returns character arrays in a list

PIKAL Petr petr.pikal at precheza.cz
Fri Oct 12 08:52:27 CEST 2012


Hi

But i still wonder what is wrong on aggregate?

aggregate(iris, list(iris$Species), summary)

gives you somewhat complicated data frame with numeric values, which you can extract as you wish.

> names(aggregate(iris, list(iris$Species), summary)[2])
[1] "Sepal.Length"

> aggregate(iris, list(iris$Species), summary)[,2]
     Min. 1st Qu. Median  Mean 3rd Qu. Max.
[1,]  4.3   4.800    5.0 5.006     5.2  5.8
[2,]  4.9   5.600    5.9 5.936     6.3  7.0
[3,]  4.9   6.225    6.5 6.588     6.9  7.9
>

Regards
Petr

> -----Original Message-----
> From: Alex van der Spek [mailto:doorz at xs4all.nl]
> Sent: Wednesday, October 10, 2012 4:03 PM
> To: PIKAL Petr
> Cc: Alex van der Spek; r-help at r-project.org
> Subject: RE: [R] Summary using by() returns character arrays in a list
> 
> Thank you Petr,
> 
> Try this
> 
> str(by(iris, iris$Species, summary))
> 
> and you will see what is actually returned is a list of 3, each element
> containing a character table, not a numeric table. The rownames of
> these tables are empty but should contain the names of the summary
> stats.
> 
> I have a workaround now. Modified the summary.data.frame method to
> output numeric values and not the character strings. The rownames I set
> afterwards in a for loop. Still would like to know how to do this
> internal to summary.data.frame though.
> 
> Regards,
> Alex van der Spek
> 
> > Hi
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> project.org] On Behalf Of Alex van der Spek
> >> Sent: Wednesday, October 10, 2012 2:48 PM
> >> To: r-help at r-project.org
> >> Subject: [R] Summary using by() returns character arrays in a list
> >>
> >> I use by() to generate a summary statistics like so:
> >>
> >> Lbys <- by(dat[Nidx], dat$LipTest, summary)
> >>
> >> where Nidx is an index vector with names picking out the columns in
> >> the data frame dat.
> >>
> >> This returns a list of character arrays (see below for str() output)
> >> where the columns are named correctly but the rownames are empty
> >> strings and the values are strings prepended with the summary
> >> statistic's name (e.g.
> >> "Min.", "Median ").
> >
> > Without knowledge of your data it is difficult to understand what is
> > wrong.
> >
> > If I use iris data set as input everything goes as expected
> > data(iris)
> >> summary(iris)
> >   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width
> >  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100
> >  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300
> >  Median :5.800   Median :3.000   Median :4.350   Median :1.300
> >  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199
> >  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800
> >  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500
> >        Species
> >  setosa    :50
> >  versicolor:50
> >  virginica :50
> >
> >
> >
> >> by(iris, iris$Species, summary)
> > iris$Species: setosa
> >   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width
> >  Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100
> >  1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200
> >  Median :5.000   Median :3.400   Median :1.500   Median :0.200
> >  Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246
> >  3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300
> >  Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600
> >        Species
> >  setosa    :50
> >  versicolor: 0
> >  virginica : 0
> >
> >
> >>
> >> I am reading the code of summary.data.frame() but can't figure out
> >> how I can change the action of that function to return list of
> >> numeric matrices with as rownames the summary statistic's name
> ("Min.", "Max."
> >> etc) and as values the numeric values of the calculated summary
> >> statistic.
> >
> > Just what do you not like on such output and how do you want the
> > output structured?
> > Maybe you want aggregate, but without simple data it is hard to say.
> >
> > aggregate(iris[1:2], list(iris$Species), summary)
> >
> > Regards
> > Petr
> >
> >>
> >> Any help much appreciated!
> >> Regards,
> >> Alex van der Spek
> >>
> >>
> >> > str(Lbys)
> >> List of 2
> >>  $    : 'table' chr [1:6, 1:19] "Min.   :-0.190  " "1st Qu.: 9.297
> "
> >> "Median :10.373  " "Mean   :10.100  " ...
> >>   ..- attr(*, "dimnames")=List of 2
> >>   .. ..$ : chr [1:6] "" "" "" "" ...
> >>   .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
> >> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
> >>  $ T38: 'table' chr [1:6, 1:19] "Min.   :8.648  " "1st Qu.:8.920  "
> >> "Median :9.018  " "Mean   :9.027  " ...
> >>   ..- attr(*, "dimnames")=List of 2
> >>   .. ..$ : chr [1:6] "" "" "" "" ...
> >>   .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
> >> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
> >>  - attr(*, "dim")= int 2
> >>  - attr(*, "dimnames")=List of 1
> >>   ..$ dat$LipTest: chr [1:2] "" "T38"
> >>  - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES
> >> = dat$LipTest, FUN = summary)
> >>  - attr(*, "class")= chr "by"
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html and provide commented, minimal, self-contained,
> >> reproducible code.
> >
> 




More information about the R-help mailing list