[R] data summerization etc...

Gabor Grothendieck ggrothendieck at gmail.com
Sat Jul 12 13:41:33 CEST 2008


See sqldf home page:
http://sqldf.googlecode.com

e.g.

library(sqldf)
set.seed(1)
pti <-rnorm(70000,10)
fid <- rnorm(70000,100)
finc <- rnorm(70000,1000)

# set is a reserved word in SQL so use sset
sset <- data.frame(fid,pti,finc)
system.time(out <- sqldf("select fid, sum(pti) from sset group by
fid")) # 3 seconds

On Fri, Jul 11, 2008 at 6:44 PM, sj <ssj1364 at gmail.com> wrote:
> Hello,
>
> I am trying to do some fairly straightforward data summarization, i.e., the
> kind you would do with a pivot table in excel or by using SQL queires. I
> have a moderately sized data set of ~70,000 records and I am trying to
> compute some group averages and sum values within groups. the code example
> below shows how I am trying to go about doing this
>
> pti <-rnorm(70000,10)
> fid <- rnorm(70000,100)
> finc <- rnorm(70000,1000)
>
>
> ### compute the sums of pti within fid groups
> sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum)
>
> #### compute mean finc within fid groups
> tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean)
>
> when I try to do it this way I get an error message telling me that enough
> memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of
> Memory). I figure that there must be a more efficent way to go about doing
> this. Please suggest.
>
> I would typically do this kind of task in a database and use SQL to push the
> data around. I know RODBC allows you to write SQL to query external DBs. Is
> there any mechanisim that allows you to write SQL queies against datasets
> internal to R e.g. in the case above
>
>
> I could do something like
>
> set <- cbind(fid,pti,finc)
>
> select fid, sum(pti)
> from set
> group by fid
>
> that would be handy!
>
> Thanks,
>
> Spencer
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list