[R] text analysis errors

Jim Lemon drj|m|emon @end|ng |rom gm@||@com
Thu Jan 7 01:34:21 CET 2021

Hi Gordon,
Looks to me as though you may have to extract the text from the Word
files. Export As Text.


On Thu, Jan 7, 2021 at 10:40 AM Gordon Ballingrud
<gob.allingrud using gmail.com> wrote:
> Hello all,
> I have asked this question on many forums without response. And although
> I've made progress myself, I am stuck as to how to respond to a particular
> error message.
> I have a question about text-analysis packages and code. The general idea
> is that I am trying to perform readability analyses on a collection of
> about 4,000 Word files. I would like to do any of a number of such
> analyses, but the problem now is getting R to recognize the uploaded files
> as data ready for analysis. But I have been getting error messages. Let me
> show what I have done so far. I have three separate commands because I
> broke the file of 4,000 files up into three separate ones because,
> evidently, the file was too voluminous to be read alone in its entirety.
> So, I divided the files up into three roughly similar folders. They are
> called ‘WPSCASES’ one through three. Here is my code, with the error
> messages for each command recorded below:
> token <-
> tokenize("/Users/Gordon/Desktop/WPSCASESONE/",lang="en",doc_id="sample")
> The code is the same for the other folders; the name of the folder is
> different, but otherwise identical.
> The error message reads:
> *Error in nchar(tagged.text[, "token"], type = "width") : invalid multibyte
> string, element 348*
> The error messages are the same for the other two commands. But the
> 'element' number is different. It's 925 for the second folder, and 4302 for
> the third.
> token2 <-
> tokenize("/Users/Gordon/Desktop/WPSCASES2/",lang="en",doc_id="sample")
> token3 <-
> tokenize("/Users/Gordon/Desktop/WPSCASES3/",lang="en",doc_id="sample")
> These are the other commands if that's helpful.
> I’ve tried to discover whether the ‘element’ that the error message
> mentions corresponds to the file of that number in the file’s order. But
> since folder 3 does not have 4,300 files in it, I think that that was
> unlikely. Please let me know if you can figure out how to fix this stuff so
> that I can start to use ‘koRpus’ commands, like ‘readability’ and its
> progeny.
> Thank you,
> Gordon
>         [[alternative HTML version deleted]]
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list