[R] Help with aggregate and cor
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Mar 10 12:31:24 CET 2010
The sqldf package can be used to manipulate R data frames with SQL
statements. See http://sqldf.googlecode.com
On Tue, Mar 9, 2010 at 9:36 PM, James Marca <jmarca at translab.its.uci.edu> wrote:
> Hello,
>
> I do not understand the correct way to approach the following problem
> in R.
>
> I have observations of pairs of variables, v1, o1, v2, o2, etc,
> observed every 30 seconds. What I would like to do is compute the
> correlation matrix, but not for all my data, just for, say 5 minutes
> or 1 hour chunks.
>
> In sql, what I would say is
>
> select id, date_trunc('hour'::text, ts) as tshour, corr(n1,o1) as corr1
> from raw30s
> where id = 1201087 and
> (ts between 'Mar 1, 2007' and 'Apr 1, 2007')
> group by id,tshour order by id,tshour;
>
>
> I've pulled data from PostgreSQL into R, and have a dataframe
> containing a timestamp column, v, and o (both numeric).
>
> I created an grouping index for every 5 minutes along these lines:
>
> obsfivemin <- trunc(obsts,units="hours")
> +( floor( (obsts$min / 5 ) ) * 5 * 60 )
>
> (where obsts is the sql timestamp converted into a DateTime object)
>
> Then I tried aggregate(df,by=obsfivemin,cor), but that seemed to pass
> just a single column at a time to cor, not the entire data frame. It
> worked for mean and sum, but not cor.
>
> In desperation, I tried looping over the different 5 minute levels and
> computing cor, but I'm so R-clueless I couldn't even figure out how to
> assign to a variable inside of that loop!
>
> code such as
>
> for (f in fivemin){
> output[f] <- cor(df[grouper==f,]); }
>
> failed, as I couldn't figure out how to initialize output so that
> output[f] would accept the output of cor.
>
> Any help or steering towards the proper R-way would be appreciated.
>
> Regards,
>
> James Marca
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list