[R] getting data into correct format for summarizing ... reshape, aggregate, or...

Gabor Grothendieck ggrothendieck at gmail.com
Mon Sep 15 18:41:14 CEST 2008


Try this:

> library(doBy)
> # make RiverMile a factor
> df.1f <- transform(df.1, RiverMile = as.factor(RiverMile))
> summaryBy(value ~., df.1f, FUN = c(mean, sd))
  RiverMile constituent  value.mean  value.sd
1       198           1 -0.06015032 0.8690358
2       198           2 -0.38923255 0.5147604
3       202           1  0.35731576 0.8280943
4       202           2  1.00463813 0.9272342
5       215           1  0.18249485 1.1861883
6       215           2 -0.10863353 0.7564736


On Mon, Sep 15, 2008 at 12:14 PM, stephen sefick <ssefick at gmail.com> wrote:
> I would like to reformat this data frame into something that I can
> produce some descriptive statistics.  I have been playing around with
> the reshape package and maybe this is not the best way to proceed.  I
> would like to use RiverMile and constituent as the grouping variables
> to get the summary statistics:
>
> 198a    198b
> mean   mean
> sd       sd
> ...        ...
>
> etc. for all of these.
> I have tried reshape and aggregate and I am sure that I am missing something...
>
> below is a naive attempt at making a data frame with the columns in
> the correct class-  This can be improved also.  There are NA in the
> real data set, but I didn't know how to randomly intersperse NA in a
> created matrix.  I hope this makes sense.  If it doesn't I will go
> back to the drawing board and try and clarify this.
>
> value <- rnorm(30)
> RiverMile <- c(rep(215, length.out=10), rep(202, length.out=10),
> rep(198, length.out=10))
> constituent <- c (rep("a", length.out=5), rep("b", length.out=5),
> rep("a", length.out=5), rep("b", length.out=5), rep("a",
> length.out=5), rep("b", length.out=5))
> df <- cbind(as.integer(RiverMile), as.factor(constituent), as.numeric(value))
> df.1 <- as.data.frame(df)
> df.1[,"V1"] <- as.integer(df.1[,"V1"])
> df.1[,"V2"] <- as.factor(df.1[,"V2"])
> df.1[,"V3"] <- as.numeric(df.1[,"V3"])
> colnames(df.1) <- c("RiverMile", "constituent", "value")
>
>
> --
> Stephen Sefick
> Research Scientist
> Southeastern Natural Sciences Academy
>
> Let's not spend our time and resources thinking about things that are
> so little or so large that all they really do for us is puff us up and
> make us feel like gods. We are mammals, and have not exhausted the
> annoying little problems of being mammals.
>
>        -K. Mullis
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list