[R] Support Counting

Petr Savicky savicky at praha1.ff.cuni.cz
Mon Apr 4 11:37:25 CEST 2011


On Mon, Apr 04, 2011 at 01:11:37AM -0500, psombe wrote:
> Hi,
>    I'm new to R and trying to some simple analysis. I have a data set with
> about 88000 transactions and i want to perform a simple support count
> analysis of an itemset which is say not a complete transaction but a subset
> of a transaction.
> say
> 
> {A,B,D} is a transaction and i want to find support of {A,B} even though it
> never occurs as only A,B in the entire set
> 
> 
>  To this i needed to create a new itemsets class and then use the support
> function but somehow the answers never seem to tally.

Hi.

The answer depends on the representation of the data set. Can you
describe the representation?

A possible representation of a data set for itemsets counting is a matrix
of 0/1. Using this representation, computing the support may be done
as follows.

  db <- matrix(0, nrow=5, ncol=5, dimnames=list(NULL, LETTERS[1:5]))
  db[1, c("A", "B", "D")] <- 1
  db[2, c("A", "B")] <- 1
  db[3, c("A", "D", "E")] <- 1
  db[4, c("B", "C", "D")] <- 1
  db[5, c("A", "B", "C")] <- 1
  db

       A B C D E
  [1,] 1 1 0 1 0
  [2,] 1 1 0 0 0
  [3,] 1 0 0 1 1
  [4,] 0 1 1 1 0
  [5,] 1 1 1 0 0

  itemset <- c("A", "B")
 
  # for each transaction, whether it contains c("A", "B")
  rowSums(db[, itemset]) == length(itemset)

  [1]  TRUE  TRUE FALSE FALSE  TRUE
 
  # the number of transactions containing c("A", "B")
  sum(rowSums(db[, itemset]) == length(itemset))

  [1] 3

Hope this helps.

Petr Savicky.



More information about the R-help mailing list