[R] Obtaining data from a different row of data frame
arun
smartpink111 at yahoo.com
Sun Sep 22 17:15:39 CEST 2013
Ira,
No problem.
If you change the ?for() loop to ?lapply, there should be increase in speed (Usually, it is not the case. Here it is does, but still not as good in terms of speed as the method I showed).
df1<- structure(list(Dates = structure(c(13151, 13152, 13153, 13154,
13157, 13158, 13159, 13160, 13161, 13164), class = "Date"), P1 = c(10,
13, 16, 19, 22, 25, 28, 31, 34, 37), P2 = c(100, 102, 104, 106,
108, 110, 112, 114, 116, 118), P3 = c(90, 94, 98, 102, 106, 110,
114, 118, 122, 126), P4 = c(70, 75, 80, 85, 90, 95, 100, 105,
110, 115), OF1 = c(3, 3, 4, 5, 2, 2, 2, 1, 1, 5), OF2 = c(5,
3, 4, 2, 1, 2, 2, 1, 1, 0), OF3 = c(4, 3, 4, 1, 3, 2, 2, 1, 1,
0), OF4 = c(3, 5, 4, 2, 3, 1, 2, 1, 1, 0)), .Names = c("Dates",
"P1", "P2", "P3", "P4", "OF1", "OF2", "OF3", "OF4"), row.names = c(NA,
-10L), class = "data.frame")
df1$OF2[9]<-4
df2<- df1
df2[,10:13]<- NA
colnames(df2)[10:13]<- paste0("newPrice",1:4)
##your code
for(j in 2:5) {
df2[j+8] = df2[df2[,j+4] + row(df2)[,j], j]
}
#using ?lapply()
df1[,10:13]<-lapply(2:5,function(j) {df1[df1[,j+4]+row(df2)[,j],j]})
colnames(df1)[10:13]<- colnames(df2)[10:13]
identical(df1,df2)
#[1] TRUE
#######Speed check:
set.seed(29)
df2<- data.frame(Dates=seq(as.Date("2006-01-03"),length.out=2000,by="1 day"),cbind(matrix(sample(10:120,2000*300,replace=TRUE),ncol=300),matrix(sample(0:6,2000*300,replace=TRUE),ncol=300)))
colnames(df2)[2:301]<- paste0("P",1:300)
colnames(df2)[302:601]<- paste0("OF",1:300)
df3<- df2
df2[,602:901]<-NA
colnames(df2)[602:901]<- paste0("newPrice",1:300)
system.time({
for(j in grep("^P",colnames(df2))) {
df2[j+600] = df2[df2[,j+300] + row(df2)[,j], j]
}
})
# user system elapsed
#11.652 0.148 11.822
system.time({df3[,602:901]<-lapply(2:301,function(j) {df3[df3[,j+300]+row(df3)[,j],j]}) })
# user system elapsed
# 2.960 0.000 2.962
colnames(df3)[602:901]<- colnames(df2)[602:901]
identical(df2,df3)
#[1] TRUE
A.K.
________________________________
From: Ira Sharenow <irasharenow100 at yahoo.com>
To: arun <smartpink111 at yahoo.com>
Sent: Sunday, September 22, 2013 10:49 AM
Subject: Re: [R] Obtaining data from a different row of data frame
Arun,
Thanks for the time you spent helping me.
I always learned to use the apply family (but maybe your strategies are faster), and now I think I am going to learn Hadley Wickham’s methods. Right now I need to do other parts of the project. In a few days I will take another look at your code to see if I can get more out of my code.
For my current project once I am finished my boss will use my code and possibly modify it, so speed is just one factor. Transparency and his future coding time is another consideration. I need to balance things off.
I need tolerable speed and relatively easy to understand code. It is an interesting trade off.
Thank again for your help. I’ll get back to you when I take another look at the details of what you wrote.
Ira
On 9/21/2013 11:27 PM, arun wrote:
HI, A modified code to avoid the ?sapply()
df1<- structure(list(Dates = structure(c(13151, 13152, 13153, 13154,
13157, 13158, 13159, 13160, 13161, 13164), class = "Date"), P1 = c(10,
13, 16, 19, 22, 25, 28, 31, 34, 37), P2 = c(100, 102, 104, 106,
108, 110, 112, 114, 116, 118), P3 = c(90, 94, 98, 102, 106, 110,
114, 118, 122, 126), P4 = c(70, 75, 80, 85, 90, 95, 100, 105,
110, 115), OF1 = c(3, 3, 4, 5, 2, 2, 2, 1, 1, 5), OF2 = c(5,
3, 4, 2, 1, 2, 2, 1, 1, 0), OF3 = c(4, 3, 4, 1, 3, 2, 2, 1, 1,
0), OF4 = c(3, 5, 4, 2, 3, 1, 2, 1, 1, 0)), .Names = c("Dates",
"P1", "P2", "P3", "P4", "OF1", "OF2", "OF3", "OF4"), row.names = c(NA,
-10L), class = "data.frame")
df1$OF2[9]<-4 df2<- df1
df2[,10:13]<- NA
colnames(df2)[10:13]<- paste0("newPrice",1:4) ##your code for(j in 2:5) {
df2[j+8] = df2[df2[,j+4] + row(df2)[,j], j]
}
indx1<- unlist(df1[,grep("OF",colnames(df1))],use.names=FALSE)
indx1[rep(seq(nrow(df1)),4)%in% 6:10][indx1[rep(seq(nrow(df1)),4)%in% 6:10]- rep(5:1,4)>=0]<- NA val1<- unlist(df1[,grep("P",colnames(df1))],use.names=FALSE)
df1[,10:13]<- val1[indx1+seq_along(indx1)]
colnames(df1)[10:13]<- colnames(df2)[10:13]
identical(df1[,10:13],df2[,10:13])
#[1] TRUE ###On a bigger dataset:
set.seed(29)
df2<- data.frame(Dates=seq(as.Date("2006-01-03"),length.out=2000,by="1 day"),cbind(matrix(sample(10:120,2000*300,replace=TRUE),ncol=300),matrix(sample(0:6,2000*300,replace=TRUE),ncol=300)))
colnames(df2)[2:301]<- paste0("P",1:300)
colnames(df2)[302:601]<- paste0("OF",1:300)
df3<- df2 df2[,602:901]<-NA
colnames(df2)[602:901]<- paste0("newPrice",1:300)
system.time({
for(j in grep("^P",colnames(df2))) {
df2[j+600] = df2[df2[,j+300] + row(df2)[,j], j]
}
})
# user system elapsed
# 8.508 0.000 8.523 colN_OF<- ncol(df3[,grep("OF",colnames(df3))])
system.time({
indx1<- unlist(df3[,grep("OF",colnames(df3))],use.names=FALSE)
indx1[rep(seq(nrow(df3)),colN_OF) %in% 1995:2000][indx1[rep(seq(nrow(df3)),colN_OF) %in% 1995:2000] - rep(6:1,colN_OF)>=0] <-NA
val1<- unlist(df3[,grep("P",colnames(df3))],use.names=FALSE)
df3[,602:901]<- val1[indx1+seq_along(indx1)]
colnames(df3)[602:901]<- colnames(df2)[602:901]
})
# user system elapsed
# 0.568 0.000 0.569 identical(df2,df3)
#[1] TRUE A.K. ----- Original Message -----
From: arun <smartpink111 at yahoo.com> To: Ira Sharenow <irasharenow100 at yahoo.com> Cc:
Sent: Sunday, September 22, 2013 1:28 AM
Subject: Re: [R] Obtaining data from a different row of data frame Ira, I tried with a bigger dataset to look for any errors in the code:
set.seed(29)
df2<- data.frame(Dates=seq(as.Date("2006-01-03"),length.out=2000,by="1 day"),cbind(matrix(sample(10:120,2000*300,replace=TRUE),ncol=300),matrix(sample(0:6,2000*300,replace=TRUE),ncol=300)))
colnames(df2)[2:301]<- paste0("P",1:300)
colnames(df2)[302:601]<- paste0("OF",1:300)
df3<- df2 df2[,602:901]<-NA
colnames(df2)[602:901]<- paste0("newPrice",1:300)
system.time({
for(j in grep("^P",colnames(df2))) {
df2[j+600] = df2[df2[,j+300] + row(df2)[,j], j]
}
})
# user system elapsed
# 9.584 0.000 9.601 vec1<- 6:1 ##change values according to the range of actual values in your rows.
vec2<- 1995:2000 ##change accordingly. If the maximum value is say 100, take 100 rows from the tail end. Change the vec1 also so that both are of the same length system.time({
df3[vec2,grep("OF",colnames(df3))]<- t(sapply(seq_along(vec1),function(i) {x1<-as.matrix(df3[vec2[i],grep("OF",colnames(df3))]); x1[x1>=vec1[i]]<-NA; x1}))
indx1<- unlist(df3[,grep("OF",colnames(df3))],use.names=FALSE)
val1<- unlist(df3[,grep("P",colnames(df3))],use.names=FALSE)
df3[,602:901]<- val1[indx1+seq_along(indx1)]
colnames(df3)[602:901]<- colnames(df2)[602:901]
})
# user system elapsed
# 0.552 0.000 0.553 identical(df2[,602:901],df3[,602:901])
#[1] TRUE A.K.
More information about the R-help
mailing list