[R] "subscript out of bounds" error when using koRpus+Tree Tagger

Jiayue Wang @@@h@@w@ng2017 @ending from y@ndex@com
Sun Dec 9 07:23:36 CET 2018


I'm trying to do text corpus processing on some novels, with koRpus 
package and Tree Tagger. The script lists all txt files (11 in all) in a 
dir, and processes it one by one.

set.kRp.env(TT.cmd = "/pathto/tree-tagger-english", lang = "en")
outdir <- "/pathto/corpora"
corpdir <- paste0(outdir,"/","morrison11")

files <- list.files(path=corpdir, pattern = "*.txt", full.names = F)
n <- length(files)

output <- file(paste0(outdir,"/calc_results_morrison11.txt"), open="at")
for (i in 1:n) {
   cat(i," - ",files[i],"\n", file = output)
   tagged.results <- treetag(paste0(corpdir,'/',files[i]),
   capture.output(flesch(tagged.results), file = output)
   cat("\n", file=output)
   capture.output(TTR(tagged.results), file = output)
   cat("\n", file=output)
   capture.output(textFeatures(tagged.results), file=output)
   cat("\n===========================\n", file = output)

The problem is, the script always throws the following error when it 
works on the last txt file and prematurely exits:

  Error in all.patterns[[word.length]] : subscript out of bounds

I can't figure out what this message means. the dir's are correct; 
there's no problem with Tree Tagger installation; n and files have the 
correct values.

Please help, many thanks!


More information about the R-help mailing list