[R] How to read HTML or TEXT file with tm package

Uwe Ligges ligges at statistik.tu-dortmund.de
Thu Feb 4 16:58:12 CET 2010



On 04.02.2010 06:58, Lica Oka wrote:
> Hi, everyone!
>
> I'm a novice at R with tm package. So I need your help!!
>
> I'd like to analyze some German texts using tm package for my papers.
> But somehow, I could not use it well.
>
> Now I'm using R ver.2.10.1 and tm package ver.0.5-2 on WindowsXP and also
> MacOS X 10.6 (snow leopard).
> Input commands that I tried are like these;
>
>> txt<- system.file("c:\\text\\", "ABHANDLUNGXII.txt", package = "tm")
>> txt
> [1] ""
>>


Hmmm, system.file constructs a file path, it does not read the contents 
of a file. And you have constructed a non existant path, hence "" is 
reported.

Perhaps you rather want
txt <- readLines(c:\\text\\ABHANDLUNGXII.txt")

which is a guess given I do not know where that files really lives nor 
do I know the way you want to import the contents.

Best wishes,
Uwe Ligges





> It seems to not be read "ABHANDLUNGXII.txt" into "txt".
>
> I've also tried some other commands;
>
>> html<- system.file("c:\\text\\", "ABHANDLUNGXII.html", package = "tm")
>> html
> [1] ""
>
> It also seems nothing to happened.
> Then I've checked the document reader by function "getReaders()" in tm
> package;
>
>> getReaders()
> [1] "readDOC"                 "readGmane"               "readPDF"
>
> [4] "readReut21578XML"        "readReut21578XMLasPlain" "readPlain"
>
> [7] "readRCV1"                "readRCV1asPlain"         "readTabular"
>
>>
>
> There seems to be no "readHTML". But there is "readPDF" and "readPlain".
> I haven't touch anything around the package (I mean, "sytem level.")
> I only installed some other standard package using the package manager in R.
>
> Is any wrong syntax or operation? Could anyone give me suggestions and
> advice about this?
> Are there anything which I have to prepare around my documents?
> Pointer to useful documents and sites are also welcome.
>
> Could you please help me? Thanks in advance!
>
> Reagards,
>
> Lica Oka from Japan.
> (sorry if there is anything wrong expression in English.)
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list