[R] How to read HTML or TEXT file with tm package
ligges at statistik.tu-dortmund.de
Thu Feb 4 16:58:12 CET 2010
On 04.02.2010 06:58, Lica Oka wrote:
> Hi, everyone!
> I'm a novice at R with tm package. So I need your help!!
> I'd like to analyze some German texts using tm package for my papers.
> But somehow, I could not use it well.
> Now I'm using R ver.2.10.1 and tm package ver.0.5-2 on WindowsXP and also
> MacOS X 10.6 (snow leopard).
> Input commands that I tried are like these;
>> txt<- system.file("c:\\text\\", "ABHANDLUNGXII.txt", package = "tm")
>  ""
Hmmm, system.file constructs a file path, it does not read the contents
of a file. And you have constructed a non existant path, hence "" is
Perhaps you rather want
txt <- readLines(c:\\text\\ABHANDLUNGXII.txt")
which is a guess given I do not know where that files really lives nor
do I know the way you want to import the contents.
> It seems to not be read "ABHANDLUNGXII.txt" into "txt".
> I've also tried some other commands;
>> html<- system.file("c:\\text\\", "ABHANDLUNGXII.html", package = "tm")
>  ""
> It also seems nothing to happened.
> Then I've checked the document reader by function "getReaders()" in tm
>  "readDOC" "readGmane" "readPDF"
>  "readReut21578XML" "readReut21578XMLasPlain" "readPlain"
>  "readRCV1" "readRCV1asPlain" "readTabular"
> There seems to be no "readHTML". But there is "readPDF" and "readPlain".
> I haven't touch anything around the package (I mean, "sytem level.")
> I only installed some other standard package using the package manager in R.
> Is any wrong syntax or operation? Could anyone give me suggestions and
> advice about this?
> Are there anything which I have to prepare around my documents?
> Pointer to useful documents and sites are also welcome.
> Could you please help me? Thanks in advance!
> Lica Oka from Japan.
> (sorry if there is anything wrong expression in English.)
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help