[R] Automating searching text for key words

LCOG1 jroll at lcog.org
Thu Apr 15 08:36:04 CEST 2010


Hello all, 
    Im doing some content analysis of news stories and i am looking for a
way to sort through different text lists searching for specified words then
storing the results, at this point just the count.  Heres what i have so
far:

#Load data frame of wed address to load ->Creates raw word data

#Create web addresses to where text data is located
WebAdds<-c("
http://anitasdailyshowpage.tripod.com/transcripts/2002bushisms.htm","http://anitasdailyshowpage.tripod.com/transcripts/2002wasntcorrspondent.htm")
#Create text data by accessing website and putting all text from page into a
list where each element
#is represents by a word 


#Loop through and load  text from  all website addresses
WordData_<-list()
for(i in 1:length(WebAdds)){
AddToUse<-WebAdds[i]
Select.WebAdd<-AddToUse
Select.WebAdd<-as.character(Select.WebAdd)
#Remove blanks from address to it can be read
Select.WebAdd<-sub("[[:blank:]]", "", Select.WebAdd) 
WordData_[[i]]<- scan(url(Select.WebAdd), what = "character") 
                                      }
  #Define words to look for
 SearchWords_ <-c("Bush","actor")

#Create lists to store retunred values WordDataResults_<-list()
 AllWordDataResults_<-list()
 WordDataResults_<-list()

 for(i in 1:length(WordData_)){
   
    for(j in 1:length(SearchWords_)){
    #Loop through all transcripts searching for each of the words in the
search list
     WordData.X <- sub(paste("",  SearchWords_[j], ").*", sep=""),
"\\1",WordData_[[i]] )
    # check if no match in original string; replace with 'other'
     match <- grep(SearchWords_[j], WordData.X)
    WordDataResults_[[j]]<-WordData.X[match]
     AllWordDataResults_[[i]]<- WordDataResults_[[j]]
    }

}

AllWordDataResults_

which returns 
[[1]]
character(0)

[[2]]
[1] "actor."

This result basically shows that the word actor was found in the 2nd web
page searched.  It should show a "Bush" , with a number of
varietiest(e.g."Bush-isms"    "Bush-ism"     "Bush","  "Bush"." 
"Bush?"," ,"Bush"         "Bush"  AND and "Actor" .

So what happens above is i load to web pages in for sample content to search
through then each word is compared to each of the web pages.  Any insight in
to how to make the basic operation of above would be appreciated as well,
but this is the best i could come up with at this point.  Thanks for any
help.

Cheers, 
JR

-- 
View this message in context: http://n4.nabble.com/Automating-searching-text-for-key-words-tp1856444p1856444.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list