[R] Print occurrence / positions of words
arun
smartpink111 at yahoo.com
Mon Apr 22 21:18:25 CEST 2013
Hi,
May be this helps:
vec<- "this is a nice text with nice characters"
library(stringr)
vec2<-unlist(str_match_all(vec,"\\w+"))
#or
# vec2<-str_split(vec," ")[[1]]
res<-unique(lapply(vec2,function(x) which(!is.na(match(vec2,x)))))
names(res)<- unique(vec2)
res
#$this
#[1] 1
#
#$is
#[1] 2
#
#$a
#[1] 3
#
#$nice
#[1] 4 7
#
#$text
#[1] 5
#
#$with
#[1] 6
#
#$characters
#[1] 8
A.K.
>Hi,
>I have tried some different packages in order to build a R program
which will take as input a text file, produce a list of the words inside
that file. Each >word should have a vector with all the places that this
word exist in the file.
>As an example, if the text file has the string:
>
>"this is a nice text with nice characters"
>
>The output should be something like:
>$this
>[1] 1
>$is
>[1] 2
>$a
>[1] 3
>$nice
>[1] 4 7
>$text
>[1] 5
>$with
>[1] 6
>$characters
>[1] 8
>A useful post which i came across here was http://r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-td4644053.html . However it doesnt include the positions of each words.
>A similar function which i found through the documentation i guess
it's the "str_locate", however i want to count "words" and not
"characters".
>Any guidance of what packages / techniques to use on that, would be really appreciated
>Thank you.
More information about the R-help
mailing list