[R] transaction list transformation to use rpart.

Allan Engelhardt allane at cybaea.com
Mon Mar 7 10:29:42 CET 2011



On 06/03/11 22:34, John Dennison wrote:
> [...]
> from data like
>
> Customer-ID | Item-ID
> cust1           | 2
> cust1           | 3
> cust1           | 5
> cust2          | 5
> cust2          | 3
> cust3         | 2
> ...
>
> #read in data to a sparse binary transaction matrix
> txn = read.transactions(file="tranaction_list.txt", rm.duplicates= TRUE,
> format="single",sep="|",cols =c(1,2));
>
> #tranaction matrix to matrix
> a<-as(txn, "matrix")
>
> #matrix to data.frame
> b<-as.data.frame(a)
>
> I end up with a data.frame like:
>
> X       X.1 X.2  X.3 X.4 X.5 ...
> cust1  0    1   1    0    1
> cust2  0    0   1    0    1
> cust3  0    1   0    0    0
> ...
>
>   However the as.data.frame(a) transforms the matrix into a numeric
> data.frame so when I implement the rpart algorithm it automatically returns
> a regression classification tree.

I am not sure your approach with rpart is going to give you what you are 
looking for, but on to your R question:

> [...] I can't successfully transform the data.frame to a factor. i
> tried:
>
> b_factor<-as.factor(b)
> Error in sort.list(y) :
>    'x' must be atomic for 'sort.list'
> Have you called 'sort' on a list?

You need to do each column individually, i.e. b_factor$X.1 <- 
as.factor(b$X.1) or

>  str( as.data.frame(lapply(b, as.factor)) )
'data.frame':    4 obs. of  4 variables:
  $ X.2      : Factor w/ 2 levels "0","1": 2 1 2 1
  $ X.3      : Factor w/ 2 levels "0","1": 2 2 1 1
  $ X.5      : Factor w/ 2 levels "0","1": 2 2 1 1
  $ X.Item.ID: Factor w/ 2 levels "0","1": 1 1 1 2


Also have a look at as(txn, "data.frame") for a different format that 
may (with some clean up) be easier to use.

>  as(txn, "data.frame")
      transactionID      items
1 cust1            { 2, 3, 5}
2  cust2              { 3, 5}
3   cust3                { 2}
4     Customer-ID  { Item-ID}


Hope this helps a little.

Allan



More information about the R-help mailing list