[R] stalled loop
Gavin Simpson
gavin.simpson at ucl.ac.uk
Sun Sep 16 18:24:29 CEST 2007
On Sun, 2007-09-16 at 08:46 -0700, kevinchang wrote:
> Hey everyone,
>
> The code I wrote executes correctly but is stalled seriously. Is there a
> way to hasten execution without coming up with a brand new algorithm
> ?please help. Thanks a lot for your time.
>
>
> #a simplified version of the code
Simple thing to do first is pre-allocate your storage. When you do:
c <- NA
You have a vector of length 1. Then in the loop, you extend C by 1 each
time/iteration. To do this, R has to copy c and then replace it. If you
set c to be the correct size in the first place, R doesn't have to do
all this copying and replacing and is much faster as a result.
If have modified your script as follows:
a <- c("superman", "xman", "spiderman", "wolfman", "mansuper", "manspider")
## uncomment the below to test how it scales
#a <- rep(a, 150000)
b <- sapply(a, function(.srt) {paste(sort(strsplit(.srt, '')[[1]]),
collapse="")})
## store number of iterations we will do
n.loop <- 1:length(b)
## use this to allocate storage space for c
c <- numeric(length = n.loop)
for(i in seq(along = c)) {
if(length(which(b == b[i])) > 1)
c[i] <- b[i]
}
c <- c[!is.na(c)]
which when timed using system.time() with a now being a vector of 900000
strings (a repeated 150000 times in this case), I got the following
timings:
user system elapsed
121.752 0.341 122.712
So 121 seconds on my laptop with 2GB of RAM is not bad for such a sized
problem.
Some further comments. Don't use 'c' as a variable name, it won't over
write the c() function but it is a bit confusing to use objects with
names the same as functions. Second, *space out your code* - what you
wrote is very difficult to parse for a human - you'll find it easier to
see mistakes etc if you spread stuff out a bit.
HTH
G
>
> a<-c("superman" , "xman" , "spiderman" ,"wolfman" ,"mansuper","manspider" )
> b<-sapply(a,function(.srt){paste(sort(strsplit(.srt,'')[[1]]),
> collapse="")})
> c<-NA
> for(i in 1:length(b)) {
> if(length(which(b==b[i]))>1)
> c[i]<-b[i]
> }
> c<-c[!is.na(c)]
> # But if my get the volumne of "a" up to about 150000 words , the loop will
> work incredibly slowly.
>
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list