[R] For loop processing too slow - pre-format data.frame?
cembling
clare.embling at plymouth.ac.uk
Sat Apr 26 22:27:24 CEST 2014
Hi,
I am bootstrapping, but my loops are taking way too long & I need to make it
faster. Looking on the R-help archive I suspect it may be due to not
specifying the size of my data.frame, mainly because I don't know in advance
how large it has to be. Can anyone help?
My data looks like this (first 5 entries of 'SpeyBay'):
Year JulianDay Hour Day Month Quarter Season SeaState Visibility TideState
1 2005 91 6 1 4 2 2 2 2 2.18
2 2005 91 7 1 4 2 2 2 2 1.53
3 2005 91 9 1 4 2 2 2 3 0.80
4 2005 91 11 1 4 2 2 2 4 0.96
5 2005 91 14 1 4 2 2 1 6 2.25
TideHeight CetPres Segment
1 2 0 1
2 3 0 1
3 5 0 2
4 -5 0 3
5 -2 0 4
I am bootstrapping 1000 times but re-sampling on segment (since my data is
autocorrelated), which means I am trying to reconstruct my data based on
random segments e.g. segment 3, then segment 1, each of which may include
from 1-14 data rows. So I don't know how many rows I am going to get in
advance.
When I run my for loop, I just use rbind with undefined size of the new
variable e.g. 'tempD2', and I suspect it is this that is slowing down the
whole process (probably partly due to having a for loop within a for loop).
Can anyone give me any advice on how to pre-define a data frame (if this is
what the data shown above is) that can have an undefined size - or how to
make it big enough to take all the data?). I've been trying to figure this
out for ages with no luck & sure it's something simple!
Code shown below - any tips on making the code faster would be greatly
appreciated - the last run took several hours which is just not practical!
Many thanks in advance,
Clare Embling
CODE:
SpringWatch <- 504
SummerWatch <- 704
AutumnWatch <- 392
MaxSample <- 704
signif <- 0
for(j in 1:1000){
# resampling 2 different years (D & E) in 3 different seasons (2, 3 & 4)
separately
D2S <- sample(D2Start:D2Stop,MaxSample,replace=T)
D3S <- sample(D3Start:D3Stop,MaxSample,replace=T)
D4S <- sample(D4Start:D4Stop,MaxSample,replace=T)
E2S <- sample(E2Start:E2Stop,MaxSample,replace=T)
E3S <- sample(E3Start:E3Stop,MaxSample,replace=T)
E4S <- sample(E4Start:E4Stop,MaxSample,replace=T)
# Creating new data frames with the first sampled segment
TempD2 <- SpeyBay[(Segment==D2S[1]),]
TempD3 <- SpeyBay[(Segment==D3S[1]),]
TempD4 <- SpeyBay[(Segment==D4S[1]),]
TempE2 <- SpeyBay[(Segment==E2S[1]),]
TempE3 <- SpeyBay[(Segment==E3S[1]),]
TempE4 <- SpeyBay[(Segment==E4S[1]),]
# loop to add together all the rows of data for each segment sampled
for(i in 2:MaxSample) {
TempD2 <- rbind(TempD2,SpeyBay[(Segment==D2S[i]),])
TempD3 <- rbind(TempD3,SpeyBay[(Segment==D3S[i]),])
TempD4 <- rbind(TempD4,SpeyBay[(Segment==D4S[i]),])
TempE2 <- rbind(TempE2,SpeyBay[(Segment==E2S[i]),])
TempE3 <- rbind(TempE3,SpeyBay[(Segment==E3S[i]),])
TempE4 <- rbind(TempE4,SpeyBay[(Segment==E4S[i]),])
}
# But actually I only want a certain number of rows of data...
NewD2 <- TempD2[1:SpringWatch,]
NewD3 <- TempD3[1:SummerWatch,]
NewD4 <- TempD4[1:AutumnWatch,]
NewE2 <- TempE2[1:SpringWatch,]
NewE3 <- TempE3[1:SummerWatch,]
NewE4 <- TempE4[1:AutumnWatch,]
# then combine together (could do this in one step!
NewD <- rbind(NewD2,NewD3,NewD4)
NewE <- rbind(NewE2,NewE3,NewE4)
CompDE <- rbind(NewD,NewE)
#Run a GLM-GEE on the resampled distributions to see if there is a
statistical difference between years
NewGLMGEE1 <-
geeglm(CetPres~Year++SeaState,data=CompDE,family=binomial,id=Segment,corstr="ar1")
pv <- summary(NewGLMGEE1)$coefficients[, "Pr(>|W|)"] ## will extract
them
signif[j] <- pv[2] # only interested in the significance of Year in the
model
}
--
View this message in context: http://r.789695.n4.nabble.com/For-loop-processing-too-slow-pre-format-data-frame-tp4689543.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list