[R] Collapsing data frame; aggregate() or better function?
jim holtman
jholtman at gmail.com
Thu Sep 13 23:18:39 CEST 2007
The second argument for aggregate is supposed to be a list, so try
(notice the missing comma before "1:8"):
test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[1:8],sum)
On 9/13/07, Tobin, Jared <TobinJR at dfo-mpo.gc.ca> wrote:
> Hello r-help,
>
> I am trying to collapse or aggregate 'some' of a data frame. A very
> simplified version of my data frame looks like:
>
> > tester
> trip set num sex lfs1 lfs2
> 1 313 15 5 M 2 3
> 2 313 15 3 F 1 2
> 3 313 17 1 M 0 1
> 4 313 17 2 F 1 1
> 5 313 17 1 U 1 0
>
> And I want to omit sex from the picture and just get an addition of num,
> lfs1, and lfs2 for each unique trip/set combination. Using aggregate()
> works fine here,
>
> > test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum)
> > test
> trip set num lfs1 lfs2
> 1 313 15 8 3 5
> 2 313 17 4 2 2
>
> But I'm having trouble getting the same function to work on my actual
> data frame which is considerably larger.
>
> > dim(lf1.turbot)
> [1] 16468 217
> > test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8],
> sum)
> Error in vector("list", prod(extent)) : vector size specified is too
> large
> In addition: Warning messages:
> 1: NAs produced by integer overflow in: ngroup * (as.integer(index) -
> one)
> 2: NAs produced by integer overflow in: group + ngroup *
> (as.integer(index) - one)
> 3: NAs produced by integer overflow in: ngroup * nlevels(index)
>
> I'm guessing that either aggregate() can't handle a data frame of this
> size OR that there is an issue with 'omitting' more than one variable
> (in the same way I've omitted sex in the above example). Can anyone
> clarify and/or recommend any relatively simple alternative procedure to
> accomplish this?
>
> I plan on trying variants of by() and tapply() tomorrow morning, but I'm
> about to head home for the day.
>
> Thanks,
>
> --
>
> jared tobin, student research assistant
> fisheries and oceans canada
> tobinjr at dfo-mpo.gc.ca
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list