[R] Create a new column based on values in two other columns
arun
smartpink111 at yahoo.com
Mon Sep 9 00:43:09 CEST 2013
HI,
df$NewPrices<- unsplit(lapply(split(df,df$Stocks),function(x) {do.call(rbind,lapply(seq_len(nrow(x)),function(i) {if(x[i,]$Offsets==2)
x[i+2,]$Prices
else if(x[i,]$Offsets==1)
x[i+1,]$Prices
else x[i,]$Prices
})) }),df$Stocks)
df$NewPrices
#[1] 13 17 12 16 17 18 16 17 18
sqldf(Q2)[,1]
#[1] 13 17 12 16 17 18 16 17 18
I think sqldf() would be faster.
A.K.
----- Original Message -----
From: Ira Sharenow <irasharenow100 at yahoo.com>
To: r-help at r-project.org
Cc:
Sent: Saturday, September 7, 2013 4:11 PM
Subject: [R] Create a new column based on values in two other columns
I am trying to add a column to a data frame. Each day for each stock I
make a prediction for a future date. Then I need to compare my
predictions to the actual values. So looking at the first row of data:
For Stock A on 2011-01-01 I predicted that on 2011-01-02 the price would
be 10.25.
Now I need an ActualPrices column. The first value should be 13.
I solved the problem using sqldf, but I would appreciate some advice on
how to solve the problem using standard R techniques. The real data
frame has over 100,000 rows.
I know that the conditions for the correct row can be found in the WHERE
clause of the SQL query and then I need to look in the Prices column to
get the value, but I do not know how to do that in standard R.
If another library would be easier, I am open to other ideas.
Dates = as.Date(c(rep("2011-01-01",3), rep("2011-01-02",3),
rep("2011-01-03",3) ), "%Y-%m-%d")
Stocks = rep(c("A", "B", "C"), 3)
Offsets = c(1,2,0,1,1,1,0,0,0)
Prices = 10:18
PredPrices = 10:18 + 0.25
df = data.frame(Stocks, Dates, Offsets, Prices, PredPrices )
df$NewDates = df$Dates + df$Offsets
df
StocksDates Offsets Prices PredPricesNewDates
1A 2011-01-0111010.25 2011-01-02
2B 2011-01-0121111.25 2011-01-03
3C 2011-01-0101212.25 2011-01-01
4A 2011-01-0211313.25 2011-01-03
5B 2011-01-0211414.25 2011-01-03
6C 2011-01-0211515.25 2011-01-03
7A 2011-01-0301616.25 2011-01-03
8B 2011-01-0301717.25 2011-01-03
9C 2011-01-0301818.25 2011-01-03
library(sqldf)
# To see everything in this small example
Q1 = "SELECT df1.Stocks, df1.Dates, df1.Offsets, df1.Prices,
df1.PredPrices, df2.Prices AS NewPrices
FROM df AS df1, df AS df2
WHERE df1.NewDates = df2.Dates AND df1.Stocks = df2.Stocks";
sqldf(Q1)
# To get the column. This what I really want
Q2 = "SELECT df2.Prices AS NewPrices
FROM df AS df1, df AS df2
WHERE df1.NewDates = df2.Dates AND df1.Stocks = df2.Stocks";
sqldf(Q2)
As I will need to reshape my data so that each row is for a specific
date, a second starting point is this data frame.
dfWide= reshape(df, direction = "wide", idvar = "Dates", timevar = "Stocks")
> dfWide
Dates Offsets.A Prices.A PredPrices.A NewDates.A Offsets.B Prices.B
PredPrices.B NewDates.B Offsets.C Prices.C PredPrices.C NewDates.C
1 2011-01-0111010.25 2011-01-0221111.25 2011-01-0301212.25 2011-01-01
4 2011-01-0211313.25 2011-01-0311414.25 2011-01-0311515.25 2011-01-03
7 2011-01-0301616.25 2011-01-0301717.25 2011-01-0301818.25 2011-01-03
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list