[R] counting words that are contained in a list

arun smartpink111 at yahoo.com
Sun Feb 16 04:34:19 CET 2014


Hi,

May be this helps:

vec1 <- c("victory","happiness","medal","war","service","ribbon", "dates")

vec2 <- c("The World War II Victory Medal was first issued as a service ribbon referred to as the Victory Ribbon.", "By 1946, a full medal had been established which was referred to as the World War II Victory Medal.", "The medal commemorates military service during World War II and is awarded to any member of the United States military, including members of the armed forces of the Government of the Philippine Islands, who served on active duty, or as a reservist, between December 7, 1941 and December 31, 1946","This is awarded for service between 7 December 1941 and 31 December 1946, both dates inclusive")
 res <-  sort(table(factor(unlist(regmatches(tolower(vec2),gregexpr(paste(vec1,collapse="|"),vec2,ignore.case=TRUE))),levels=vec1)),decreasing=TRUE)
res
 #     war     medal   victory   service    ribbon     dates happiness 
 #       5         4         3         3         2         1         0 
res[1:5]


A.K.



Hi guys! 

I have a vector with a list of words e.g c("victory","happines"). 

I have a vector of sentence e.g. In "WWII the victory was achived by allied forces". 

As word victory is in my list, victory has a frequency of 1, happines 0. 

At the end I wolud like to get 5 most frequent words from my list that appear in sentences. 

Can you help me. 

Uros



More information about the R-help mailing list