[R] Measuring dispersion
Jim Lemon
jim at bitwrit.com.au
Wed Jun 18 13:42:40 CEST 2008
S. Nunes wrote:
> Thanks for the suggestion, however I'm looking for a score since my
> goal is to rank thousands of distributions.
> For instance, given a large text, I would like to rank all terms
> according to their distribution (dispersion) within the text.
>
> Terms evenly distributed in the text should have a low score. Terms
> following an uneven distribution should rank higher.
>
Hi Sergio,
para1<-"If you just want an index of the uniformity of the distribution
of words within a given block of text, one method is to take the
variance of the differences between the indices."
para2<-"As an example, consider the distribution of the word the in this
sentence and the one above by taking the two variances of the
differences between the indices."
# imagine that the paragraphs are stored as two character strings
splitpara1<-unlist(strsplit(para1," "))
splitpara2<-unlist(strsplit(para2," "))
paraindex1<-which(splitpara1%in%"the")
paraindex2<-which(splitpara2%in%"the")
para1var<-var(diff(paraindex1))
para2var<-var(diff(paraindex2))
para1var
para2var
Jim
More information about the R-help
mailing list