[R] lapply or data.table to find a unit's previous transaction
William Rogers
whrogers73 at gmail.com
Thu Jun 3 04:29:56 CEST 2010
I have a dataset of property transactions that includes the
transaction ID (TranID), property ID (UnitID), and transaction date
(TranDt). I need to create a data frame (or data table) that includes
the previous transaction date, if one exists.
This is an easy problem in SQL, where I just run a sub-query, but I'm
trying to make R my one-stop-shopping program. The following code
works on a subset of my data, but I can't run this on my full dataset
because my computer runs out of memory after about 30 minutes. (Using
a 32-bit machine.)
Use the following synthetic data for example.
n<- 100
TranID<- lapply(n:(2*n), function(x) (
as.matrix(paste(x, sample(seq(as.Date('2000-01-01'),
as.Date('2010-01-01'), "days"), sample(1:5, 1)), sep= "D"), ncol= 1)))
TranID<- do.call("rbind", TranID)
UnitID<- substr(TranID, 1, nchar(n))
TranDt<- substr(TranID, nchar(n)+2, nchar(n)+11)
Data<- data.frame(TranID= TranID, UnitID= UnitID, TranDt= as.Date(TranDt))
#First I create a list of all the previous transactions by unit
TranList<- as.matrix(Data$TranID, ncol= 1)
PreTran<- lapply(TranList,
function(x) (with(Data,
Data[
UnitID== substr(x, 1, nchar(n))&
TranDt< Data[TranID== x, "TranDt"], ]
))
)
#I do get warnings about missing data because some transactions have
no predecessor.
#Some transactions have no previous transactions, others have many so
I pick the most recent
BeforeTran<- lapply(seq_along(PreTran), function(x) (
with(PreTran[[x]], PreTran[[x]][which(TranDt== max(TranDt)), ])))
#I need to add the current transaction's TranID to the list so I can merge later
BeforeTran<- lapply(seq_along(PreTran), function(x) (
transform(BeforeTran[[x]], TranID= TranList[x, 1])))
#Finally, I convert from a list to a data frame
BeforeTran<- do.call("rbind", BeforeTran)
#I have used a combination of data.table and for loops, but that seems
cheesey and doesn't preform much better.
library(data.table)
#First I create a list of all the previous transactions by unit
TranList2<- vector(nrow(Data), mode= "list")
names(TranList2)<- levels(Data$TranID)
DataDT<- data.table(Data)
#Use a for loop and data.table to find the date of the previous transaction
for (i in levels(Data$TranID)) {
if (DataDT[UnitID== substr(i, 1, nchar(n))&
TranDt<= (DataDT[TranID== i, TranDt]),
length(TranDt)]> 1)
TranList2[[i]]<- cbind(TranID= i,
DataDT[UnitID== substr(i, 1, nchar(n))&
TranDt< (DataDT[TranID== i, TranDt]),
list(TranDt= max(TranDt))])
}
#Finally, I convert from a list to a data table
BeforeTran2<- do.call("rbind", TranList2)
#My intution says that this code doesn't take advantage of
data.table's attributes.
#Are there any ideas out there? Thank you.
#P.S. I've tried plyr and it does not help my memory problem.
--
William H. Rogers
More information about the R-help
mailing list