[R] transaction list transformation to use rpart.
Allan Engelhardt
allane at cybaea.com
Mon Mar 7 10:29:42 CET 2011
On 06/03/11 22:34, John Dennison wrote:
> [...]
> from data like
>
> Customer-ID | Item-ID
> cust1 | 2
> cust1 | 3
> cust1 | 5
> cust2 | 5
> cust2 | 3
> cust3 | 2
> ...
>
> #read in data to a sparse binary transaction matrix
> txn = read.transactions(file="tranaction_list.txt", rm.duplicates= TRUE,
> format="single",sep="|",cols =c(1,2));
>
> #tranaction matrix to matrix
> a<-as(txn, "matrix")
>
> #matrix to data.frame
> b<-as.data.frame(a)
>
> I end up with a data.frame like:
>
> X X.1 X.2 X.3 X.4 X.5 ...
> cust1 0 1 1 0 1
> cust2 0 0 1 0 1
> cust3 0 1 0 0 0
> ...
>
> However the as.data.frame(a) transforms the matrix into a numeric
> data.frame so when I implement the rpart algorithm it automatically returns
> a regression classification tree.
I am not sure your approach with rpart is going to give you what you are
looking for, but on to your R question:
> [...] I can't successfully transform the data.frame to a factor. i
> tried:
>
> b_factor<-as.factor(b)
> Error in sort.list(y) :
> 'x' must be atomic for 'sort.list'
> Have you called 'sort' on a list?
You need to do each column individually, i.e. b_factor$X.1 <-
as.factor(b$X.1) or
> str( as.data.frame(lapply(b, as.factor)) )
'data.frame': 4 obs. of 4 variables:
$ X.2 : Factor w/ 2 levels "0","1": 2 1 2 1
$ X.3 : Factor w/ 2 levels "0","1": 2 2 1 1
$ X.5 : Factor w/ 2 levels "0","1": 2 2 1 1
$ X.Item.ID: Factor w/ 2 levels "0","1": 1 1 1 2
Also have a look at as(txn, "data.frame") for a different format that
may (with some clean up) be easier to use.
> as(txn, "data.frame")
transactionID items
1 cust1 { 2, 3, 5}
2 cust2 { 3, 5}
3 cust3 { 2}
4 Customer-ID { Item-ID}
Hope this helps a little.
Allan
More information about the R-help
mailing list