[R] Error in Corpus() in tm package

Milan Bouchet-Valat nalimilan at club.fr
Sun Aug 18 15:01:42 CEST 2013


Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit :
> It contains all text files which were converted from doc, docx, ppt
> etc. using libreoffice. 
> Some of them are non-english text documents.
> 
> 
> Sorry I cannot share the corpus.. but if someone can shed light on
> what might cause this error then I can try to eliminate those
> documents if some specific docs are causing it.
I think you should go the other way round: try with only one document
and see if it works, and do enough attempts to find out in what cases it
works and in what cases it fails. If it always fails, try with examples
provided by tm, and then with parts of your documents.

I don't think it makes sense to try to use VectorSource() as it would
imply reimplementing DirSource().


Regards

> On Sat, Aug 17, 2013 at 9:55 AM, Milan Bouchet-Valat
> <nalimilan at club.fr> wrote:
>         Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale a écrit :
>         > I am trying to use the text mining package ... I keep
>         getting this error :
>         >
>         > rm(list=ls())
>         > library(tm)
>         > sourceDir <- "Z:\\projectk_viz\\docs_to_index"
>         > ovid <- Corpus(DirSource(sourceDir),readerControl =
>         list(language = "lat"))
>         >
>         > Error in if (vectorized && (length <= 0)) stop("vectorized
>         sources must
>         > have positive length") : missing value where TRUE/FALSE
>         needed
>         >
>         > I am not sure what it means.
>         
>         The posting guide asks for a reproducible example. If you
>         cannot make
>         available to us the contents of sourceDir, at least you should
>         tell us
>         what kind of files it contains. Have you tried with only some
>         of the
>         files the directory contains ?
>         
>         
>         Regards
>         
>         > --ajinkya
>         >
>         >       [[alternative HTML version deleted]]
>         >
>         > ______________________________________________
>         > R-help at r-project.org mailing list
>         > https://stat.ethz.ch/mailman/listinfo/r-help
>         > PLEASE do read the posting guide
>         http://www.R-project.org/posting-guide.html
>         > and provide commented, minimal, self-contained, reproducible
>         code.
>         
> 
> 
> 
> 
> -- 
> 
> Sincerely,
> Ajinkya
> http://ajinkya.info
>



More information about the R-help mailing list