[R] How to speed up a double loop?

Mon Mar 2 12:11:14 CET 2015

Dear R-users,

I would like to speed up a double-loop I developed for detecting and
removing outliers in my whole data.frame. The idea is to remove data with a
too big difference with the previous value. If detected, this test must be
done here on maximum the next 10 values following the last correct one (and
put an index on another column).

It works well on a small data frame, but really too slowly for my real DF
with 500 000 rows.
Here's a fake data example and the double-loop:

    myts <- data.frame(x=c(1,2,50,40,30,40,100,1,50,1,2,3,3,5,4),y=NA)    

    for(jj in 1:(nrow(myts)-10)){
        for(nn in ((jj+1):(jj+10))) {
           if((!is.na(myts[jj,1])) & (!is.na(myts[nn,1])) &
(abs((myts[nn,1])-(myts[jj,1]))>15))
               { myts[nn,2] <- 1
                 myts[nn,1] <- NA } } } 

Can somebody explain me how can I speed this up easily? I heard about
vectorization but I don't really understand how it works.

--
View this message in context: http://r.789695.n4.nabble.com/How-to-speed-up-a-double-loop-tp4704054.html
Sent from the R help mailing list archive at Nabble.com.