[R] PROBLEM USING DICTIONARY WITH TM PACKAGE

Patrick Casimir patrcasi at nova.edu
Sat May 20 20:18:32 CEST 2017


Jeff,


Here is the solution:


myTerms <- c("prostatic", "adenocarcinoma", "grade")
inspect(DocumentTermMatrix(docs, list(dictionary = myTerms)))  ## only returns from first 10 docs in DTM
as.matrix(DocumentTermMatrix(docs, list(dictionary = myTerms)))  ## returns from all docs in the DTM



Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________
From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
Sent: Friday, May 19, 2017 11:04:22 AM
To: r-help at r-project.org; Patrick Casimir; r-help at r-project.org
Subject: Re: [R] PROBLEM USING DICTIONARY WITH TM PACKAGE

Considering the deafening silence after three repeats, one explanation could be that you are asking the wrong group of people. It is also possible that your failure to follow the Posting Guide with regard to using plain text email and a reproducible example [1][2] means that readers who are not experts do not feel inclined to follow along with you and help you think of solutions. Keep in mind that supporting  contributed packages like tm is technically not on topic here, though people often do feel the urge to help solve problems with them anyway.

With regard to asking the wrong group of people I would suggest asking the maintainer of the tm package what they recommend. See the help for the maintainer function or read the CRAN Web page for that package.

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html
--
Sent from my phone. Please excuse my brevity.

On May 19, 2017 7:12:45 AM PDT, Patrick Casimir <patrcasi at nova.edu> wrote:
>Dear Members & Experts,
>
>
>Since the Dictionary () function is no longer available with the tm
>package. How do I use other functions to do the same as below? I want
>to capture a list of specific terms from a corpus. By example, if my
>corpus has 102 files. I want to see a list with occurrences of
>prostatic, adenocarcinoma, grade in all 102 files. When I use the
>function Dictionary (), I got the error: Error: could not find function
>"Dictionary"
>
>
>> d <- Dictionary(c("prostatic", "adenocarcinoma", "grade"))
>> inspect(DocumentTermMatrix(docs, list(dictionary = d)))
>
>
>But if I use the codes below using inspect, the dictionary only returns
>the terms for 10 files instead of 102. I need a way to get my
>dictionary to capture and return those terms for all 102 files or
>whatever other terms I select. I know I am close but inspect () is not
>the right function.
>
>
>> myTerms <- c("prostatic", "adenocarcinoma", "grade")
>> inspect(DocumentTermMatrix(docs, list(dictionary = myTerms)))
>
> <<DocumentTermMatrix (documents: 102, terms: 3)>>
> Non-/sparse entries: 292/14
> Sparsity           : 5%
> Maximal term length: 14
> Weighting          : term frequency (tf)
> Sample             :
>                Terms
> Docs            adenocarcinoma grade prostatic
>   Patient14.txt             11     6         3
>   Patient15.txt              7    12         2
>   Patient16.txt             13    16         4
>   Patient19.txt              5    13         2
>   Patient24.txt             11    12         4
>   Patient25.txt              8     9         4
>   Patient41.txt              8    10         4
>   Patient46.txt              8    10         3
>   Patient8.txt               9    12         2
>   Patient9.txt               8    23         2
>
>
>Thanks
>
>
>
>Patrick Casimir, PhD
>Health Analytics, Data Science, Big Data Expert & Independent
>Consultant
>C: 954.614.1178
>
>
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list