Matt mvgnyc at gmail.com
Sat Feb 28 15:05:07 CET 2009


I'm trying out the package arules and I'm having a bit of trouble
getting my data to work properly. I have a set of transactions with
the purchased products but each product could appear in a different
column in the data frame. This causes the rules to be built based on
the ordering, which is not significant.

Here is an example:

# # Code:
my.df <- data.frame(
 item1=c("a", "b", "c", "d"),
 item2=c("e", "a", "f", "b"),
 item3=c("h", "i", "b", "a"))

# Create transactions
my.trans <- as(my.df[,2:4], "transactions")

# Create Rules
rules <- apriori(my.trans, parameter=list(support=.01, confidence=0.6))

## End code

I'd like the confidence to be high for a -> b or b -> a (they appear
together in each transaction) regardless of *where* they appear.

This example gives the expected results:

## Working example:
my.df2 <- data.frame(
 a = rep("a", 4),
 b = rep("b", 4),
 c = c(NA, "c", NA, NA),
 d = c(NA, NA, "d", "d"))
my.trans2 <- as(my.df2[,2:5], "transactions")
rules2 <- apriori(my.trans2, parameter=list(support=.01, confidence=0.6))
## End code

I can't figure out how to coerce my data frame into this format (or if
this is the best way to accomplish my objective).

I appreciate your help.


