[R] re ading and analyzing a word document

cls59 chuck at sharpsteen.net
Thu Oct 1 06:18:16 CEST 2009

PDXRugger wrote:
> Considering your instructions:
> #Define words to find
> to.find <- c( 'the', 'is', 'are' ,'dr') 
> #Read in the file... 
> file.text <- readLines( 'data/letter.txt' ) 
> #Count number of occurnces of deined word in text
> line.matches <- unlist( lapply( to.find, grep, x = unlist(file.text[2]) )
> ) 
> Result:
>> line.matches 
> [1] 1 1 1
> This is not right of course as there are actually four words and secondly
> becasue the searched words appear multiple times.  

The example I gave was only meant to identify those lines on which matches
occurred. Using x = unlist(file.text[2]) only feeds one line of the file
into the matching routine so the result indicates that all the matches were
on line 1-- the only line present for searching.

If you want to count the individual occurrences of the words on each line,
you may need to look at using a function such as gregexpr. grep only
indicates if a match or matches is present in a line of text-- gregexpr
indicates at which positions those matches occur in the line.

However, you may be getting to the point with this where R is no longer an
appropriate tool for this job. R is amazingly flexible it is possible that
it can give you what you want. However, R was not designed to perform text
processing-- Perl comes to mind as being a language that was explicitly
designed to perform these sorts of operations.


Charlie Sharpsteen
Environmental Resources Engineering
Humboldt State University
View this message in context: http://www.nabble.com/reading-and-analyzing-a-word-document-tp25691972p25692881.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list