[R] stalled loop

Sun Sep 16 18:24:29 CEST 2007

On Sun, 2007-09-16 at 08:46 -0700, kevinchang wrote:
> Hey everyone,
> 
> The code I wrote executes correctly but  is stalled seriously. Is there a
> way to hasten execution without coming up with a  brand new algorithm
> ?please help. Thanks a lot for your time.
> 
> 
> #a simplified version of the code

Simple thing to do first is pre-allocate your storage. When you do:

c <- NA

You have a vector of length 1. Then in the loop, you extend C by 1 each
time/iteration. To do this, R has to copy c and then replace it. If you
set c to be the correct size in the first place, R doesn't have to do
all this copying and replacing and is much faster as a result.

If have modified your script as follows:

a <- c("superman", "xman", "spiderman", "wolfman", "mansuper", "manspider")
## uncomment the below to test how it scales
#a <- rep(a, 150000)
b <- sapply(a, function(.srt) {paste(sort(strsplit(.srt, '')[[1]]),
            collapse="")})
## store number of iterations we will do
n.loop <- 1:length(b)
## use this to allocate storage space for c
c <- numeric(length = n.loop)
for(i in seq(along = c)) {
    if(length(which(b == b[i])) > 1)
    c[i] <- b[i]
}
c <- c[!is.na(c)]

which when timed using system.time() with a now being a vector of 900000
strings (a repeated 150000 times in this case), I got the following
timings:

   user  system elapsed 
121.752   0.341 122.712 

So 121 seconds on my laptop with 2GB of RAM is not bad for such a sized
problem.

Some further comments. Don't use 'c' as a variable name, it won't over
write the c() function but it is a bit confusing to use objects with
names the same as functions. Second, *space out your code* - what you
wrote is very difficult to parse for a human - you'll find it easier to
see mistakes etc if you spread stuff out a bit.

HTH

G

> 
> a<-c("superman" , "xman" , "spiderman" ,"wolfman" ,"mansuper","manspider" )
> b<-sapply(a,function(.srt){paste(sort(strsplit(.srt,'')[[1]]),
> collapse="")})
> c<-NA 
> for(i in 1:length(b)) {
> if(length(which(b==b[i]))>1)
> c[i]<-b[i]
> }
> c<-c[!is.na(c)]
> # But if my get the volumne of "a" up to about 150000 words , the loop will
> work incredibly slowly.
> 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson                 [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%