[R] data mining for R

John Day jday at csi-inc.com
Thu Sep 5 21:54:34 CEST 2002


Philippe,

I think you have slightly misunderstood what data-mining is all about.

Data-miners tend to see themselves applying expertise from at least three 
areas: statistics, machine learning ("AI"), and database theory.

Most data-mining problems involve extracting or detecting useful 
information from huge or complex sources of data. In any case, a data-miner 
would certainly find S-Plus or R a valuable tool for investigating and 
solving data-mining problems.

But a data-miner might also use techniques like "reinforcement learning", 
"inductive logic", "natural language processing" or "relational theory" to 
discover concepts and relationships which characterize and solve the 
problem. These techniques may not be interesting to a "pure" statistician.

HTH,
John Day
Staff Scientist
Computer Science Innovations,
Melbourne, FL
http://www.csi-inc.com/~jday

At 04:16 PM 9/5/2002 +0200, Philippe wrote:
>In the risk to be heavily critisized, one could mainly see data mining as a
>pseudo-new concept invented to sell new (and sometimes, expensive) software
>to industries. Data mining is nothing else than existing statistical
>analyses optimized for speed in order to deal with millions of entries, or
>even more, in a reasonable period of time. So, as it was suggested earlier
>in this thread, methods probably exist already somewhere in R. On the
>counterpart, R could not be optimized enough to deal with the huge dataset
>usually manipulated by data mining software.
>Best,
>
>Philippe Grosjean
>
>-----Message d'origine-----
>De: owner-r-help at stat.math.ethz.ch
>[mailto:owner-r-help at stat.math.ethz.ch]De la part de Peter Dalgaard BSA
>Envoye: jeudi 5 septembre 2002 14:37
>A: Prof Brian Ripley
>Cc: Pgoodr1 at aol.com; r-help at stat.math.ethz.ch
>Objet: Re: [R] data mining for R
>
>
>Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>
> > Well, R does not have a `statistics' plug in either!
> >
> > In the words of Witten & Franke's book, Data Mining is `statistics plus
> > marketing', and R can do a lot of data mining.
> >
> > If you could be more specififc about what techniques you want to use, we
> > may be able to help you further.
> >
> > On Thu, 5 Sep 2002 Pgoodr1 at aol.com wrote:
> >
> > > I was wondering if R had a data mining componant and how i could get it.
>If not do you know anyone who is developing a datamining "plug in" for R
> > > Phillip Goodreid
>
>Another possible definition is "statistics with massive amounts of
>incidental data". A large part of the DM practices seems to be
>"quarrying". The actual statistical methodology is only a part of a
>complicated process of getting data out of databases on a, say, weekly
>schedule, roughly preprocessed, then fed to a statistics engine, and
>postprocessed to something that can end up on the manager's desk.
>
>In my impression that is essentially what SPSS's Clementine product
>does, using a GUI to draw arrows between pretty little hexagonal
>cells. It is not at all unthinkable that something like that could be
>coded up in R too. I think we have most of the pieces to do it.
>
>--
>    O__  ---- Peter Dalgaard             Blegdamsvej 3
>   c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
>  (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
>~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
>-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
>_._
>
>
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ 
>

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list