[R] DocumentTermMatrix error

Matevž Pavlič matevz.pavlic at gi-zrmk.si
Sat May 21 14:58:42 CEST 2011


Got it...the problem was with Slovenian characters. Once i replaced them with normal characters it works fine.

Tnx anyway, m

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Matevž Pavlič
Sent: Saturday, May 21, 2011 1:27 PM
To: r-help at r-project.org
Cc: feinerer at logic.at
Subject: [R] DocumentTermMatrix error

Hi all, 

 

I have tried to create  a DocumentTermMatrix with a tm package, but i get this error :

 

Error in tolower(txt) : 

  invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'

 

I tried doing this as it is showed in :

http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining),

 

with this R code :

 

setwd("C:/Users/mpavlic/Desktop/temp")

tekst <- Corpus(DirSource("."))

>Warning message:

>In readLines(y, encoding = x$Encoding) :

>incomplete final line found on './test.txt'

 

meta(tekst, "Heading", "local") <- c("test")

meta(tekst[[1]])

>Available meta data pairs are:

  Author       : 

   DateTimeStamp: 2011-05-21 11:25:21

   Description  : 

   Heading      : test

  ID           : test.txt

  Language     : en

  Origin       :

 

test <- TermDocumentMatrix(tekst)

> Error in tolower(txt) : 

> invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'

 

 

Attached is a small sample (test.txt) on which i worked.

 

Any help would be appreaciated, 

m

 

 



More information about the R-help mailing list