[R] Arules - Association Rules

Michael Hahsler hahsler at ai.wu.ac.at
Fri Dec 4 16:12:19 CET 2009


Alexandre,

You are mining for association rules with an absolute support count of 
1001*0.01 = 10.1. From your 40 minutes processing time I guess that your 
survey data set is very dense and that results in an enormous number of 
frequent itemsets (potentially up to 2^k - k - 1 which is in your case 
for k=71 about 10^21) which causes you to run out of memory. You have 
the following options:

* increase minimum support (e.g., start with 0.1) and see how low you 
can go without using up all your memory (I don't know how to watch 
memory usage on Windows).

* restrict the maximal length of frequent itemsets by using 
parameter=list(maxlen = 3, support = ...)

If that does not help I will need your dataset and some code to 
reproduce and study the problem.

-Michael


> Hi I'm a windows XP user. My notebook have 1gb ram, 160gb hd, processor amd turion64 1,6gh. For processing, it takes about 40 minutes. This is the code i used: dados=read.csv("C:/Documents and Settings/Administrador/Meus documentos/My Dropbox/Estat?stica/Association Rules/Top2009 alterado.csv", header=T, sep=";") library(arules) bin=as(dados, "transactions") rules <- apriori(bin, parameter = list(support = 0.01, confidence = 0.6)) Below is a sample of the file. I export data as CSV from excel. I have 71 columns (variabels) an 1001 lines (responses) 1. Churrascaria 2. Supermercado 3. Restaurante Self Service 4. Restaurante Chin?s 5. Buffet 6. Sorvete Galp?o Nelore Super Muffato N?o Sabe Jin Jin Planalto S?vio Vento Sul Super Muffato N?o Sabe N?o Sabe N?o Sabe Doce Ver?o Ga?cha Super Muffato N?o Sabe N?o Sabe N?o Sabe Kibon Tradi??o Ga?cha Super Muffato N?o Sabe N?o Sabe N?o Sabe Nestl? N?o Sabe Super Muffato N?o Sabe N?o Sabe Estilo S?vio Rancho Grill Viscardi Akira Akira N?
o Sabe N?o Sabe Thank you very for your helping!!! On 3 dez, 01:46, Steve Lianoglou <mailinglist.honey... at gmail.com> wrote:
>> > Hi,
>> >
>> > On Wed, Dec 2, 2009 at 6:57 PM, Alexandre - UEL <shima... at gmail.com> wrote:
>> >
>>> > > Hello everybody!
>>> > > I'm trying some datamining, but i'm having some problems with arule
>>> > > package, in the end of processing R "had to be closed". I already
>>> > > tryied to reinstall the 2.10 version, change the computer and
>>> > > realocated more virtual memory.
>> >
>>> > > Does anyone had this problem to?
>> >
>>> > > I had a hiphoteses that i have to prepare the data, somehow i don't
>>> > > know.
>> >
>>> > > Thanks for helping!!!
>> >
>> > Can you provide more info here?
>> >
>> > 1. I'm assuming since you're talking about reallocating virtual
>> > memory, or whatever, you're on windows?
>> > 2. What's the exact error you're getting (what's it saying before "R
>> > 'had to be closed'"?
>> > 3. What's the size of your data? What type of data is it?
>> > 4. How much RAM do you have?
>> > 5. Are you on a 32 or 64 bit system?
>> > 6 What happens if you cut your data in half?
>> > 6. Can you provide a (very small) reproducible example of your data + code?
>> > ...
>> >
>> > -steve
>> >
>> > --
>> > Steve Lianoglou
>> > Graduate Student: Computational Systems Biology
>> >  | Memorial Sloan-Kettering Cancer Center
>> >  | Weill Medical College of Cornell University
>> > Contact Info:http://cbio.mskcc.org/~lianos/contact
>> >
>> > ______________________________________________
>> > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
> 




-- 
   Michael Hahsler
   email: michael at hahsler.net
   web: http://michael.hahsler.net




More information about the R-help mailing list