[R] How do I use R to build a dictionary of proper nouns?
Boris Steipe
boris.steipe at utoronto.ca
Fri May 5 10:39:25 CEST 2017
Did you try using the table() function, possibly in combination with sort() or rank()?
Consider:
myNouns <- c("proper", "nouns", "domain", "ontology", "dictionary",
"dictionary", "corpus", "patent", "files", "proper", "nouns",
"word", "frequency", "file", "preprocess", "corpus", "proper",
"nouns", "domain", "ontology", "idea", "nouns", "dictionary",
"dictionary", "corpus", "attachments", "texts", "corpus",
"preprocesses", "proper", "nouns")
myNounFrequencies <- table(myNouns)
myNounFrequencies
myNounFrequencies <- sort(myNounFrequencies, decreasing = TRUE)
myNounFrequencies
which(names(myNounFrequencies) == "corpus")
> On May 5, 2017, at 1:58 AM, θ " <yarmi1224 at hotmail.com> wrote:
>
> θ " 已與您共用 OneDrive 檔案。若要檢視檔案,請按下面的連結。
>
>
> <https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
> [https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
>
> 2.corpus_patent text.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
>
> <https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
> [https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
>
> 3ontology_proper nouns keywords.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
>
> <https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
> [https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
>
> 1.patents.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
>
>
>
>
> Hi :
>
> I want to do patents text mining in R.
> I need to use the proper nouns of domain ontology to build a dictionary.
> Then use the dictionary to analysis my corpus of patent files.
> I want to calculate the proper nouns and get the word frequency that appears in each file.
>
> Now I have done the preprocess for the corpus and extract the proper nouns from domain ontology.
> But I have no idea how to build a proper nouns dictionary and use the dictionary to analysis my corpus.
>
> The Attachments are my texts, corpus preprocesses and proper nouns.
>
> Thanks.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list