[R] need technique for speeding up R dataframe individual element insertion (no deletion though)

Ishwor ishwor.gurung at gmail.com
Sun Aug 30 14:04:18 CEST 2009


Bill, Jim

One word. Thanks tons! :-)

2009/8/13  <Bill.Venables at csiro.au>:
> Why do you need an explicit loop at all?
>
> (Also, your loop goes over i in 1:length(cam$end_date) but your code refers to cam$end_date[i+1] -->||<--!!)
>
>
> Here is a suggestion.  You want to identify places where the date increases but the volume does not change.  OK, where?
>
> ind <- with(cam, {
>           dx <- as.numeric(diff(strptime(end_date, "%d/%m/%Y")))
>           dt <- diff(vol)
>           which(dx > 0 & dt == 0)
> })
>
> Now adjust the new data frame
>
> cap <- within(cam, {
>             levels[ind] <- 1
>             levels[ind+1] <- 1
> })
>
> Of course this is untested code, so caveat emptor!
>
> Bill Venables.
>
> ________________________________________
> From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Ishwor [ishwor.gurung at gmail.com]
> Sent: 13 August 2009 22:07
> To: r-help at r-project.org
> Subject: [R] need technique for speeding up R dataframe individual element      insertion (no deletion though)
>
> Hi fellas,
>
> I am working on a dataframe cam and it involves comparison within the
> 2 columns - t1 and t2 on about 20K rows and 14 columns.
>
> ###
> cap = cam; # this doesn't take long. ~1 secs.
>
>
> for( i in 1:length(cam$end_date))
>  {
>    x1=strptime(cam$end_date[i], "%d/%m/%Y");
>    x2=strptime(cam$end_date[i+1], "%d/%m/%Y");
>
>    t1= cam$vol[i];
>    t2= cam$vol[i+1];
>
>    if(!is.na(x2) && !is.na(x1) && !is.na(t1) && !is.na(t2))
>    {
>      if( (x2>=x1) && (t1==t2) ) # date and vol
>      {
>        cap$levels[i]=1; #make change to specific dataframe cell
>        cap$levels[i+1]=1;
>      }
>    }
>  }
> ###
>
> Having coded that, i ran a timing profile on this section and each
> 1000'th row comparison is taking ~1.1 minutes on a 2.8Ghz dual-core
> box (which is a test box we use).
> This obviously computes to ~21 minutes for 20k which is definitely not
> where we want it headed. I believe, optimisation(or even different way
> to address indexing inside dataframe) can be had inside the innermost
> `if' and specifically in `cap$levels[i]=1;' but I am a bit at a loss
> having scoured the documentation failing to find anything of value.
> So, my question remains are there any general/specific changes I can
> do to speed up the code execution dramatically?
>
> Thanks folks.
>
> --
> Regards,
> Ishwor Gurung
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Regards,
Ishwor Gurung




More information about the R-help mailing list