[R] data mining/text mining?

Weiwei Shi helprhelp at gmail.com
Fri Jun 8 17:12:27 CEST 2007


Dear Ruixin:
Among others, text mining is dealing with non-structural data while
data mining mainly focuses on structural one. Many algorithms can be
shared b/w them; however, some necessary data preprocessing is
required for text mining. There are a lot of online-resource there.

As to packages used for text mining in R, esp. for preprocessing,
please check the following link:
http://wwwpeople.unil.ch/jean-pierre.mueller/

I used that package very long time ago and am not sure if they are
updated for this current version of R; otherwise, you might need to go
back the old version like R1.1.

If you want to do text mining for chinese text (I guess :), there is
additional work (i.e. word splitting) needed. I remember there is some
researcher from Taiwan who does pretty good work and you can google
that. I cannot remember the details.

HTH,

Weiwei


On 6/8/07, Ruixin ZHU <rxzhu at scbit.org> wrote:
> Dear R-user,
>
> Could anybody tell me of the key difference between data mining and text
> mining?
> Please make a list for packages about data/text mining.
> And give me an example of text mining with R (any relating materials
> will be highly appreciated), because a vignette written by Ingo Feinerer
> seems too concise for me.
>
> Thanks
> _____________________________________________
> Dr.Ruixin ZHU
> Shanghai Center for Bioinformation Technology
> rxzhu at scbit.org
> zhurx at mail.sioc.ac.cn
> 86-21-13040647832
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the R-help mailing list