[R] Obtaining data from a different row of data frame

arun smartpink111 at yahoo.com
Sun Sep 22 01:00:01 CEST 2013


Hi Ira,

Some suggestions:
1. Not sure why you created a dataframe with NA's in the beginning.  It seemed to be not required here.
df1 = data.frame(matrix(rep(NA, 10*7), nrow = 10))

2. The below code:
df1 = data.frame(Dates = Dates, P1 = P1, P2 = P2, OF1 = OF1, OF2 = OF2)
can be simplifed as:
df1 = data.frame(Dates,P1,P2, P3, P4, OF1, OF2,OF3, OF4)
 head(df1,2)
#       Dates P1  P2 P3 P4 OF1 OF2 OF3 OF4
#1 2006-01-03 10 100 90 70   3   5   4   3
#2 2006-01-04 13 102 94 75   3   3   3   5


3.  If you have multiple columns of NA's to create

df1$newPrice1 = rep(NA, 10)
 df1$newPrice2 = rep(NA, 10)

This could be also done by:
df1[,6:7]<- NA
 colnames(df1)[6:7]<- paste0("newPrice",1:2)



#Regarding the question:
#Created another dataset with couple more columns:

 Dates =  as.Date(c("2006-01-03", "2006-01-04", "2006-01-05", "2006-01-06", "2006-01-09", "2006-01-10", "2006-01-11",
                    "2006-01-12", "2006-01-13", "2006-01-16"), format = "%Y-%m-%d")
 P1 = seq(from = 10, by = 3, length.out = 10)
 P2 = seq(from = 100, by = 2, length.out = 10)
P3= seq(from= 90, by=4,length.out=10)
 P4= seq(from=70,by=5,length.out=10)
 OF1 = c(3,3,4,5,2,2,2,1,1,0)
 OF2 = c(5,3,4,2,1,2,2,1,1,0)
 OF3 <- c(4,3,4,1,3,2,2,1,1,0)
 OF4<- c(3,5,4,2,3,1,2,1,1,0)
df1 = data.frame(Dates,P1,P2, P3, P4, OF1, OF2,OF3, OF4)
df2<- df1
 df2[,10:13]<- NA
colnames(df2)[10:13]<- paste0("newPrice",1:4)

##your code

for(j in 2:5) {
 df2[j+8] = df2[df2[,j+4] + row(df2)[,j], j]
 }


#modified code #didn't check the speed.
indx1<- unlist(df1[,grep("OF",colnames(df1))],use.names=FALSE)
val1<- unlist(df1[,grep("P",colnames(df1))],use.names=FALSE)
 df1[,10:13]<- val1[indx1+seq_along(indx1)]
 colnames(df1)[10:13]<- colnames(df2)[10:13]
 identical(df1,df2)
#[1] TRUE


The "average" part is not clear.
For ex:
df2$OF1
# [1] 3 3 4 5 2 2 2 1 1 0

df2$P1
# [1] 10 13 16 19 22 25 28 31 34 37


Could you explain it in terms of the above values?


A.K.



________________________________
From: Ira Sharenow <irasharenow100 at yahoo.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Saturday, September 21, 2013 5:09 PM
Subject: Re: [R] Obtaining data from a different row of data frame



Arun,

Thanks for helping me improve my question.

I made a slight change to the data, so that the coincidence will not occur.

Please note that eventually I will have to write a function for a user. Also after my rewrite, I have better code, but I do not know how to get rid of the loop. Also eventually I will need to generalize and take the average of r rows above and below the row that this algorithm is attempting to retrieve. For example if r = 5 and the row 1 offset gets me to row 100, I will eventually need the price in rows 95 through 105. Yes, I realize that I need to deal with a number of annoying details such as possibly winding up with row -3.


> df1 = data.frame(matrix(rep(NA, 10*7), nrow = 10))
> Dates =  as.Date(c("2006-01-03", "2006-01-04", "2006-01-05", "2006-01-06", "2006-01-09", "2006-01-10", "2006-01-11", 
+                    "2006-01-12", "2006-01-13", "2006-01-16"), format = "%Y-%m-%d")
> P1 = seq(from = 10, by = 3, length.out = 10)
> P2 = seq(from = 100, by = 2, length.out = 10)
> OF1 = c(3,3,4,5,2,2,2,1,1,0)
> OF2 = c(5,3,4,2,1,2,2,1,1,0)
> df1 = data.frame(Dates = Dates, P1 = P1, P2 = P2, OF1 = OF1, OF2 = OF2)
> df1$newPrice1 = rep(NA, 10)
> df1$newPrice2 = rep(NA, 10)
> 
> df1
        Dates P1  P2 OF1 OF2 newPrice1 newPrice2
1  2006-01-03 10 100   3   5        NA        NA
2  2006-01-04 13 102   3   3        NA        NA
3  2006-01-05 16 104   4   4        NA        NA
4  2006-01-06 19 106   5   2        NA        NA
5  2006-01-09 22 108   2   1        NA        NA
6  2006-01-10 25 110   2   2        NA        NA
7  2006-01-11 28 112   2   2        NA        NA
8  2006-01-12 31 114   1   1        NA        NA
9  2006-01-13 34 116   1   1        NA        NA
10 2006-01-16 37 118   0   0        NA        NA
> 
> for(j in 2:3) {
+ df1[j+4] = df1[df1[,j+2] + row(df1)[,j], j]
+ }
> df1
        Dates P1  P2 OF1 OF2 newPrice1 newPrice2
1  2006-01-03 10 100   3   5        19       110
2  2006-01-04 13 102   3   3        22       108
3  2006-01-05 16 104   4   4        28       112
4  2006-01-06 19 106   5   2        34       110
5  2006-01-09 22 108   2   1        28       110
6  2006-01-10 25 110   2   2        31       114
7  2006-01-11 28 112   2   2        34       116
8  2006-01-12 31 114   1   1        34       116
9  2006-01-13 34 116   1   1        37       118
10 2006-01-16 37 118   0   0        37       118
>  


> # Better code. Produces exactly the same results as above.
> for(j in 2:3) {
+ df1[j+4] = df1[df1[,j+2] + row(df1)[,j], j]
+ }
> df1 
On 9/21/2013 1:39 PM, arun wrote:

Hi,
Your example dataset could be confusing as: with(df1,P1+OF1)
# [1] 13 14 16 18 16 17 18 18 19 19
 with(df1,P2+OF2)
# [1] 105 104 106 105 105 107 108 108 109 109 which is the same as:
 df1$newPrice1
# [1] 13 14 16 18 16 17 18 18 19 19
 df1$newPrice2
# [1] 105 104 106 105 105 107 108 108 109 109 ----- Original Message -----
From: Ira Sharenow <irasharenow100 at yahoo.com> To: r-help at r-project.org Cc: 
Sent: Saturday, September 21, 2013 11:30 AM
Subject: [R] Obtaining data from a different row of data frame I have a large data frame with 2,000 rows and 600 columns. I can write 
loops to solve a smaller problem, but I need a better strategy for this 
data frame. Below is a simple example with just two stocks. In the data frame, each row represents a trading day. The first column 
is dates. The next group of columns represents the prices of the stocks 
on the specified dates. The next group of columns represents how many 
trading days I wish to offset. So if the first trading day is 2006-01-03 
and OF1 == 3, then I need to go to row 1+3 and get the price in column 
P1. The result is placed in row 1 of column 6. df1 = data.frame(matrix(rep(NA, 10*7), nrow = 10)) Dates =as.Date(c("2006-01-03", "2006-01-04", "2006-01-05", "2006-01-06", 
"2006-01-09", "2006-01-10", "2006-01-11", "2006-01-12", "2006-01-13", "2006-01-16"), format = "%Y-%m-%d") P1 = 10:19 P2 = 100:109 OF1 = c(3,3,4,5,2,2,2,1,1,0) OF2 = c(5,3,4,2,1,2,2,1,1,0) df1 = data.frame(Dates = Dates, P1 = P1, P2 = P2, OF1 = OF1, OF2 = OF2) df1$newPrice1 = rep(NA, 10) df1$newPrice2 = rep(NA, 10) for(j in 6:7) { for(i in 1:10 ) { rowNumber = i + df1[i,j-2] #print(rowNumber) df1[i,j] = df1[rowNumber, j-4] } # end i loop } # end j loop df1 
>df1 
>Dates P1P2 OF1 OF2 newPrice1 newPrice2 1 2006-01-03 10 1003513105 22006-01-04 11 1013314104 32006-01-05 12 1024416106 42006-01-06 13 1035218105 52006-01-09 14 1042116105 62006-01-10 15 1052217107 72006-01-11 16 1062218108 82006-01-12 17 1071118108 92006-01-13 18 1081119109 10 2006-01-16 19 1090019109     [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.        



More information about the R-help mailing list