[R] Error in Corpus() in tm package
Milan Bouchet-Valat
nalimilan at club.fr
Sun Aug 18 19:18:20 CEST 2013
Le dimanche 18 août 2013 à 09:19 -0700, Ajinkya Kale a écrit :
> I did exactly what you mentioned... tried subset of these documents
> and found out there were some junk non-txt files which were causing
> this issue. Everything worked fine with dirsource once I deleted them
> from the dir.
> But I feel these functions should also tell what file they are failing
> at.... I have ended up debugging with sub sets of input one too many
> times.
Good. Could you send us (or maybe privately to me) at least an excerpt
of the file that is enough to reproduce the bug? Indeed it would be nice
to get a more explicit error message from tm if possible.
Regards
>
> On Aug 18, 2013 9:01 AM, "Milan Bouchet-Valat" <nalimilan at club.fr>
> wrote:
> Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit :
> > It contains all text files which were converted from doc,
> docx, ppt
> > etc. using libreoffice.
> > Some of them are non-english text documents.
> >
> >
> > Sorry I cannot share the corpus.. but if someone can shed
> light on
> > what might cause this error then I can try to eliminate
> those
> > documents if some specific docs are causing it.
> I think you should go the other way round: try with only one
> document
> and see if it works, and do enough attempts to find out in
> what cases it
> works and in what cases it fails. If it always fails, try with
> examples
> provided by tm, and then with parts of your documents.
>
> I don't think it makes sense to try to use VectorSource() as
> it would
> imply reimplementing DirSource().
>
>
> Regards
>
> > On Sat, Aug 17, 2013 at 9:55 AM, Milan Bouchet-Valat
> > <nalimilan at club.fr> wrote:
> > Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale
> a écrit :
> > > I am trying to use the text mining package ... I
> keep
> > getting this error :
> > >
> > > rm(list=ls())
> > > library(tm)
> > > sourceDir <- "Z:\\projectk_viz\\docs_to_index"
> > > ovid <- Corpus(DirSource(sourceDir),readerControl
> =
> > list(language = "lat"))
> > >
> > > Error in if (vectorized && (length <= 0))
> stop("vectorized
> > sources must
> > > have positive length") : missing value where
> TRUE/FALSE
> > needed
> > >
> > > I am not sure what it means.
> >
> > The posting guide asks for a reproducible example.
> If you
> > cannot make
> > available to us the contents of sourceDir, at least
> you should
> > tell us
> > what kind of files it contains. Have you tried with
> only some
> > of the
> > files the directory contains ?
> >
> >
> > Regards
> >
> > > --ajinkya
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained,
> reproducible
> > code.
> >
> >
> >
> >
> >
> > --
> >
> > Sincerely,
> > Ajinkya
> > http://ajinkya.info
> >
>
More information about the R-help
mailing list