[R-sig-hpc] lapply or data.table to find a unit's previous transaction

Brian G. Peterson brian at braverock.com
Wed Jun 2 18:55:51 CEST 2010


William,

This is a general r-help question, not a high performance computing question, 
and should be directed to r-help for more useful and relevant answers, and to 
keep this list (and its archives) on topic.

Also, when you re-post to r-help, you will get a more helpful answer if you 
have a fully reproducible example, per the posting guidelines.

Regards,

    - Brian

On 06/02/2010 11:43 AM, William Rogers wrote:
> I have a dataset of property transactions that inlcudes the transaction ID
> (TranID), property ID (UnitID), and transaction date (TranDt). I need to
> create a data frame (or data table) that includes the previous transaction
> date, if one exists.
>
> This is an easy problem in SQL, where I just run a subquery, but I'm trying
> to make R my one-stop-shopping program.  The following code works on a
> subset of my data, but I can't run this on my full dataset becuase my
> computer runs out of memory after about 30 minutes.
>
> #First I create a list of all the previous transactions by unit
>
> TranList<- as.matrix(Data$TranID, ncol= 1) #Matrix of transaction IDs
>
> PreTran<- lapply(TranList,                 #Make a list of all the prevoius
> transactions
>    function(x) (with(Data,                  #TranID= cat(UnitID, TranDt)
>    Data[
>    UnitID== substr(x, 1, 9)&
>    TranDt<  Data[TranID== x, "TranDt"], ]
>    ))
>    )
>
> #I do get warnings about missing data because some transactions have no
> predecessor.
>
> #Some transactions have no previous transactions, others have many so I pick
> the most recent
>
> BeforeTran<- lapply(seq_along(PreTran), function(x) (
> with(PreTran[[x]], PreTran[[x]][which(TranDt== max(TranDt)), ])))
>
> #I need to add the current transaction's TranID to the list so I can merge
> later
>
> BeforeTran<- lapply(seq_along(PreTran), function(x) (
> transform(BeforeTran[[x]], TranID= TranList[x, 1])))
>
> #Finally, I convert from a list to a data frame
>
> BeforeTran<- do.call("rbind", BeforeTran)
>
> #I have used a combination of data.table and for loops, but that seems
> cheesey and doesn't preform much better.
>
> library(data.table)
>
> #First I create a list of all the previous transactions by unit
>
> TranList<- vector(nrow(Data), mode= "list")
> names(TranList)<- levels(Data$TranID)
>
> #Convert my data frame to a data table
>
> DataDT<- data.table(Data, key= "UnitID")
>
> #Use a for loop and data.table to find the date of the previous transaction
> for (i in levels(Data$TranID)) {
> if (DataDT[J(substr(i, 1, 9))&  #I run an IF statement to avoid 'missing
> data' errors
>     TranDt<= (DataDT[TranID== i, TranDt]), #But it slows things down
> length(TranDt)]>  1)
>
> TranList[[i]]<- cbind(TranID= i, #This finds the last transaction date and
> combines
> DataDT[J(substr(i, 1, 9))&  #the current transaction ID
> TranDt<  (DataDT[TranID== i, TranDt]),
> list(TranDt= max(TranDt))])
> }
>
> #Finally, I convert from a list to a data table
>
> BeforeTran<- do.call("rbind", TranList)
>
> My intution says that this code doesn't take advantage of data.table's
> attributes.
>
> Are there any ideas out there?  Thank you.
>
> P.S. I've tried plyr and it does not help my memory problem.
>



More information about the R-sig-hpc mailing list