[R] re ading and analyzing a word document
chuck at sharpsteen.net
Thu Oct 1 06:18:16 CEST 2009
> Considering your instructions:
> #Define words to find
> to.find <- c( 'the', 'is', 'are' ,'dr')
> #Read in the file...
> file.text <- readLines( 'data/letter.txt' )
> #Count number of occurnces of deined word in text
> line.matches <- unlist( lapply( to.find, grep, x = unlist(file.text) )
>  1 1 1
> This is not right of course as there are actually four words and secondly
> becasue the searched words appear multiple times.
The example I gave was only meant to identify those lines on which matches
occurred. Using x = unlist(file.text) only feeds one line of the file
into the matching routine so the result indicates that all the matches were
on line 1-- the only line present for searching.
If you want to count the individual occurrences of the words on each line,
you may need to look at using a function such as gregexpr. grep only
indicates if a match or matches is present in a line of text-- gregexpr
indicates at which positions those matches occur in the line.
However, you may be getting to the point with this where R is no longer an
appropriate tool for this job. R is amazingly flexible it is possible that
it can give you what you want. However, R was not designed to perform text
processing-- Perl comes to mind as being a language that was explicitly
designed to perform these sorts of operations.
Environmental Resources Engineering
Humboldt State University
View this message in context: http://www.nabble.com/reading-and-analyzing-a-word-document-tp25691972p25692881.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help