[R] Summary using by() returns character arrays in a list
PIKAL Petr
petr.pikal at precheza.cz
Fri Oct 12 08:52:27 CEST 2012
Hi
But i still wonder what is wrong on aggregate?
aggregate(iris, list(iris$Species), summary)
gives you somewhat complicated data frame with numeric values, which you can extract as you wish.
> names(aggregate(iris, list(iris$Species), summary)[2])
[1] "Sepal.Length"
> aggregate(iris, list(iris$Species), summary)[,2]
Min. 1st Qu. Median Mean 3rd Qu. Max.
[1,] 4.3 4.800 5.0 5.006 5.2 5.8
[2,] 4.9 5.600 5.9 5.936 6.3 7.0
[3,] 4.9 6.225 6.5 6.588 6.9 7.9
>
Regards
Petr
> -----Original Message-----
> From: Alex van der Spek [mailto:doorz at xs4all.nl]
> Sent: Wednesday, October 10, 2012 4:03 PM
> To: PIKAL Petr
> Cc: Alex van der Spek; r-help at r-project.org
> Subject: RE: [R] Summary using by() returns character arrays in a list
>
> Thank you Petr,
>
> Try this
>
> str(by(iris, iris$Species, summary))
>
> and you will see what is actually returned is a list of 3, each element
> containing a character table, not a numeric table. The rownames of
> these tables are empty but should contain the names of the summary
> stats.
>
> I have a workaround now. Modified the summary.data.frame method to
> output numeric values and not the character strings. The rownames I set
> afterwards in a for loop. Still would like to know how to do this
> internal to summary.data.frame though.
>
> Regards,
> Alex van der Spek
>
> > Hi
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> project.org] On Behalf Of Alex van der Spek
> >> Sent: Wednesday, October 10, 2012 2:48 PM
> >> To: r-help at r-project.org
> >> Subject: [R] Summary using by() returns character arrays in a list
> >>
> >> I use by() to generate a summary statistics like so:
> >>
> >> Lbys <- by(dat[Nidx], dat$LipTest, summary)
> >>
> >> where Nidx is an index vector with names picking out the columns in
> >> the data frame dat.
> >>
> >> This returns a list of character arrays (see below for str() output)
> >> where the columns are named correctly but the rownames are empty
> >> strings and the values are strings prepended with the summary
> >> statistic's name (e.g.
> >> "Min.", "Median ").
> >
> > Without knowledge of your data it is difficult to understand what is
> > wrong.
> >
> > If I use iris data set as input everything goes as expected
> > data(iris)
> >> summary(iris)
> > Sepal.Length Sepal.Width Petal.Length Petal.Width
> > Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
> > 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
> > Median :5.800 Median :3.000 Median :4.350 Median :1.300
> > Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
> > 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
> > Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
> > Species
> > setosa :50
> > versicolor:50
> > virginica :50
> >
> >
> >
> >> by(iris, iris$Species, summary)
> > iris$Species: setosa
> > Sepal.Length Sepal.Width Petal.Length Petal.Width
> > Min. :4.300 Min. :2.300 Min. :1.000 Min. :0.100
> > 1st Qu.:4.800 1st Qu.:3.200 1st Qu.:1.400 1st Qu.:0.200
> > Median :5.000 Median :3.400 Median :1.500 Median :0.200
> > Mean :5.006 Mean :3.428 Mean :1.462 Mean :0.246
> > 3rd Qu.:5.200 3rd Qu.:3.675 3rd Qu.:1.575 3rd Qu.:0.300
> > Max. :5.800 Max. :4.400 Max. :1.900 Max. :0.600
> > Species
> > setosa :50
> > versicolor: 0
> > virginica : 0
> >
> >
> >>
> >> I am reading the code of summary.data.frame() but can't figure out
> >> how I can change the action of that function to return list of
> >> numeric matrices with as rownames the summary statistic's name
> ("Min.", "Max."
> >> etc) and as values the numeric values of the calculated summary
> >> statistic.
> >
> > Just what do you not like on such output and how do you want the
> > output structured?
> > Maybe you want aggregate, but without simple data it is hard to say.
> >
> > aggregate(iris[1:2], list(iris$Species), summary)
> >
> > Regards
> > Petr
> >
> >>
> >> Any help much appreciated!
> >> Regards,
> >> Alex van der Spek
> >>
> >>
> >> > str(Lbys)
> >> List of 2
> >> $ : 'table' chr [1:6, 1:19] "Min. :-0.190 " "1st Qu.: 9.297
> "
> >> "Median :10.373 " "Mean :10.100 " ...
> >> ..- attr(*, "dimnames")=List of 2
> >> .. ..$ : chr [1:6] "" "" "" "" ...
> >> .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
> >> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
> >> $ T38: 'table' chr [1:6, 1:19] "Min. :8.648 " "1st Qu.:8.920 "
> >> "Median :9.018 " "Mean :9.027 " ...
> >> ..- attr(*, "dimnames")=List of 2
> >> .. ..$ : chr [1:6] "" "" "" "" ...
> >> .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
> >> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
> >> - attr(*, "dim")= int 2
> >> - attr(*, "dimnames")=List of 1
> >> ..$ dat$LipTest: chr [1:2] "" "T38"
> >> - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES
> >> = dat$LipTest, FUN = summary)
> >> - attr(*, "class")= chr "by"
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html and provide commented, minimal, self-contained,
> >> reproducible code.
> >
>
More information about the R-help
mailing list