[R] text mining - text comparing
Matevž Pavlič
matevz.pavlic at gi-zrmk.si
Wed May 25 22:49:15 CEST 2011
Hi all,
I'll try to explain what i would like to achieve.
I have two problmes that i would need help on if someone has a clue.
1.) I have a TXT file containing two fields : USCS and Description.
For each field of USCS I have a field Descrition that contained a lot of words that describe that particular USCS type. What i would like to do is tomine the text using tm package in order to find which words in Description filed are the most frequent for each USCS field.
Now i don't think i will have problems with that part, but the problem is importing the data. The thing is that there is areound 300 different USCS - Descritption combinations which is of course to much to sort out by hand. I would have to create a Corpus of around 300 texts which I could later anylize. Here is where i get stuck. I can not find a way to import the data in a Corpus so that i would have a text named after USCS value and containing strings (words) of Desription field.
Attached (temp.txt) is a small dataset.
2.) Second thing is about comparing text. I have some problems with typos in a text, so what i would like is to find a words that are similar (but spelled incorrectly). Similar that when typing in google engine, you get prposed words. Has anyone had any experiance in that?
I hope i explaine ok, otherwise i'll try again,
Tnx, m
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110525/46b0af08/attachment.txt>
More information about the R-help
mailing list