[R] DocumentTermMatrix error
Matevž Pavlič
matevz.pavlic at gi-zrmk.si
Sat May 21 13:26:40 CEST 2011
Hi all,
I have tried to create a DocumentTermMatrix with a tm package, but i get this error :
Error in tolower(txt) :
invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'
I tried doing this as it is showed in :
http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining),
with this R code :
setwd("C:/Users/mpavlic/Desktop/temp")
tekst <- Corpus(DirSource("."))
>Warning message:
>In readLines(y, encoding = x$Encoding) :
>incomplete final line found on './test.txt'
meta(tekst, "Heading", "local") <- c("test")
meta(tekst[[1]])
>Available meta data pairs are:
Author :
DateTimeStamp: 2011-05-21 11:25:21
Description :
Heading : test
ID : test.txt
Language : en
Origin :
test <- TermDocumentMatrix(tekst)
> Error in tolower(txt) :
> invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'
Attached is a small sample (test.txt) on which i worked.
Any help would be appreaciated,
m
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110521/fe77f990/attachment.txt>
More information about the R-help
mailing list