[R] need technique for speeding up R dataframe individual element insertion (no deletion though)

Ishwor ishwor.gurung at gmail.com
Thu Aug 13 14:07:17 CEST 2009


Hi fellas,

I am working on a dataframe cam and it involves comparison within the
2 columns - t1 and t2 on about 20K rows and 14 columns.

###
cap = cam; # this doesn't take long. ~1 secs.


for( i in 1:length(cam$end_date))
  {
    x1=strptime(cam$end_date[i], "%d/%m/%Y");
    x2=strptime(cam$end_date[i+1], "%d/%m/%Y");

    t1= cam$vol[i];
    t2= cam$vol[i+1];

    if(!is.na(x2) && !is.na(x1) && !is.na(t1) && !is.na(t2))
    {
      if( (x2>=x1) && (t1==t2) ) # date and vol
      {
        cap$levels[i]=1; #make change to specific dataframe cell
        cap$levels[i+1]=1;
      }
    }
  }
###

Having coded that, i ran a timing profile on this section and each
1000'th row comparison is taking ~1.1 minutes on a 2.8Ghz dual-core
box (which is a test box we use).
This obviously computes to ~21 minutes for 20k which is definitely not
where we want it headed. I believe, optimisation(or even different way
to address indexing inside dataframe) can be had inside the innermost
`if' and specifically in `cap$levels[i]=1;' but I am a bit at a loss
having scoured the documentation failing to find anything of value.
So, my question remains are there any general/specific changes I can
do to speed up the code execution dramatically?

Thanks folks.

-- 
Regards,
Ishwor Gurung




More information about the R-help mailing list