[R] Calculating distance between words in string

David Winsemius dwinsemius at comcast.net
Fri Nov 6 17:56:53 CET 2015


> On Nov 6, 2015, at 3:28 AM, Karl <josip.2000 at gmail.com> wrote:
> 
> Hi All,
> 
> Using R for text processing is quite new to me, while I have found a lot of
> useful functions and I'm beginning to learn regex, I need help with the
> following task. How do I calculate the distance between words?
> 
> That is, given a specific keyword, I need to assign labels to the other
> words based on the distance (number of words) to this keyword.
> 
> For example, if the keyword is "amet" and the string of words is


strng <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit.”

> -> "dolor" would get a value of -2
> -> "elit" would get a value of 3

words <- unlist(strsplit(strng, "\\W"))
words[words != ""]
#[1] "Lorem"       "ipsum"       "dolor"       "sit"        
#[5] "amet"        "consectetur" "adipiscing"  "elit"       
real <- words[words != “"]

which(real == "amet")
#[1] 5
length(real)
#[1] 8
 vec <- 1:length(real) - which(real == "amet")
 names(vec) <- real

 vec["dolor"]
#dolor 
#   -2 


> #
> If the sentence contains more than one instance of the keyword, I need
> values for each instance. Moreover, one can assume that I can split my data
> into sentences, so there is no need to search and recognize sentences (this
> is a separate problem).
> 
> Thank you!
> 
> Best regards,
> Jay
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list