[R] loop in a data.table

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Mar 14 02:28:41 CET 2013


Hi,

On Wed, Mar 13, 2013 at 7:25 PM, Camilo Mora <cmora at dal.ca> wrote:
> Hi everyone,
>
> I have a data.table called "data" with many columns which I want to group by
> column1 using data.table, given how fast it is.
>
> The problem with looping a data.table is that data.table does not like
> quotations  to define the column names (e.g. "col2" instead of col2). I
> found a way around which is to use get("col2"), which works fine but the
> processing time multiples by 20.
>
> So if I use:
>
> data[,sum(col2),by=(key)]
>
> entering the column names by hand, the operation is done in 1 sec. but if in
> the contrary I use:
>
> data[,sum(get("col2")),by=(key)]
>
> using a loop to put the column names, the same operation takes 20 sec. I
> cannot use the former code because I have 100000 files to process but the
> later will simply take months to complete. Is there any alternative to the
> function "get" or any other way in which data.table con recognize the names
> of the columns?.

I'm still not sure what you're trying to do. Could you maybe create an
example that's a bit closer to you real data and the stuff you want to
do on it?

Are all the columns of the same type?
Are you just summing columns?

If you post code into an email that reconstructions a small version of
your data.table (maybe 5-10 columns and one or two groups) it'd be
more clear for me.

Thanks,
-steve
-- 
Steve Lianoglou
Defender of The Thesis
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list