[R] For loop processing too slow - pre-format data.frame?

cembling clare.embling at plymouth.ac.uk
Sat Apr 26 22:27:24 CEST 2014


Hi,

I am bootstrapping, but my loops are taking way too long & I need to make it
faster. Looking on the R-help archive I suspect it may be due to not
specifying the size of my data.frame, mainly because I don't know in advance
how large it has to be. Can anyone help?

My data looks like this (first 5 entries of 'SpeyBay'):

  Year JulianDay Hour Day Month Quarter Season SeaState Visibility TideState
1 2005        91    6   1     4       2      2        2          2      2.18
2 2005        91    7   1     4       2      2        2          2      1.53
3 2005        91    9   1     4       2      2        2          3      0.80
4 2005        91   11   1     4       2      2        2          4      0.96
5 2005        91   14   1     4       2      2        1          6      2.25
  TideHeight CetPres Segment
1          2       0       1
2          3       0       1
3          5       0       2
4         -5       0       3
5         -2       0       4

I am bootstrapping 1000 times but re-sampling on segment (since my data is
autocorrelated), which means I am trying to reconstruct my data based on
random segments e.g. segment 3, then segment 1, each of which may include
from 1-14 data rows. So I don't know how many rows I am going to get in
advance.

When I run my for loop, I just use rbind with undefined size of the new
variable e.g. 'tempD2', and I suspect it is this that is slowing down the
whole process (probably partly due to having a for loop within a for loop).

Can anyone give me any advice on how to pre-define a data frame (if this is
what the data shown above is) that can have an undefined size - or how to
make it big enough to take all the data?). I've been trying to figure this
out for ages with no luck & sure it's something simple!

Code shown below - any tips on making the code faster would be greatly
appreciated - the last run took several hours which is just not practical!

Many thanks in advance,
Clare Embling

CODE: 

SpringWatch <- 504
SummerWatch <- 704
AutumnWatch <- 392
MaxSample <- 704

signif <- 0

for(j in 1:1000){

   # resampling 2 different years (D & E) in 3 different seasons (2, 3 & 4)
separately
   D2S <- sample(D2Start:D2Stop,MaxSample,replace=T)
   D3S <- sample(D3Start:D3Stop,MaxSample,replace=T)
   D4S <- sample(D4Start:D4Stop,MaxSample,replace=T)
   E2S <- sample(E2Start:E2Stop,MaxSample,replace=T)
   E3S <- sample(E3Start:E3Stop,MaxSample,replace=T)
   E4S <- sample(E4Start:E4Stop,MaxSample,replace=T)

   # Creating new data frames with the first sampled segment
   TempD2 <- SpeyBay[(Segment==D2S[1]),]
   TempD3 <- SpeyBay[(Segment==D3S[1]),]
   TempD4 <- SpeyBay[(Segment==D4S[1]),]
   TempE2 <- SpeyBay[(Segment==E2S[1]),]
   TempE3 <- SpeyBay[(Segment==E3S[1]),]
   TempE4 <- SpeyBay[(Segment==E4S[1]),]

   # loop to add together all the rows of data for each segment sampled
   for(i in 2:MaxSample) {
      TempD2 <- rbind(TempD2,SpeyBay[(Segment==D2S[i]),])
      TempD3 <- rbind(TempD3,SpeyBay[(Segment==D3S[i]),])
      TempD4 <- rbind(TempD4,SpeyBay[(Segment==D4S[i]),])
      TempE2 <- rbind(TempE2,SpeyBay[(Segment==E2S[i]),])
      TempE3 <- rbind(TempE3,SpeyBay[(Segment==E3S[i]),])
      TempE4 <- rbind(TempE4,SpeyBay[(Segment==E4S[i]),])
   }
   # But actually I only want a certain number of rows of data...
   NewD2 <- TempD2[1:SpringWatch,]   
   NewD3 <- TempD3[1:SummerWatch,]   
   NewD4 <- TempD4[1:AutumnWatch,]   
   NewE2 <- TempE2[1:SpringWatch,]   
   NewE3 <- TempE3[1:SummerWatch,]   
   NewE4 <- TempE4[1:AutumnWatch,]   

   # then combine together (could do this in one step!
   NewD <- rbind(NewD2,NewD3,NewD4)
   NewE <- rbind(NewE2,NewE3,NewE4)

   CompDE <- rbind(NewD,NewE)

   #Run a GLM-GEE on the resampled distributions to see if there is a
statistical difference between years 

   NewGLMGEE1 <-
geeglm(CetPres~Year++SeaState,data=CompDE,family=binomial,id=Segment,corstr="ar1")
   pv <- summary(NewGLMGEE1)$coefficients[, "Pr(>|W|)"]  ## will extract
them
   signif[j] <- pv[2] # only interested in the significance of Year in the
model
}








--
View this message in context: http://r.789695.n4.nabble.com/For-loop-processing-too-slow-pre-format-data-frame-tp4689543.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list