[R] Trouble with Functions
dougmcintosh
doug.mcintosh at gmail.com
Wed Jun 6 01:52:35 CEST 2012
Hi guys,
I'm a new to R and following along with Tutorials using this book:
http://www.amazon.com/Practical-Statistical-Analysis-Non-structured-Applications/dp/012386979X
In one of them, they use the twitteR package and describe the following
function (see below). From what I can tell from the documentation (R),
there's a method to call it directly in an interactive session. The way it's
presented in the book, however, it appears it's formatted for a text file
that (I'm assuming) you call using source(). At any rate, when I read it
into the session using source I get errors and I can't find documentation on
what's wrong (or tell from the feedback). Hoping you can help!
/Implementing Our Sentiment Scoring Algorithm
To score each tweet, our score.sentiment() function uses laply() to iterate
through the input text. It strips punctuation and control characters from
each line using R’s regular expression-powered substitution function, gsub()
and uses match() against each word list to find matches:/
score.sentiment = function(sentences, pos.words, neg.words,
.progress=’none’)
{
require(plyr)
require(stringr)
# we got a vector of sentences. plyr will handle a list
# or a vector as an “l” for us
# we want a simple array of scores back, so we use
# “l” + “a” + “ply” = “laply”:
scores = laply(sentences, function(sentence, pos.words, neg.words) {
# clean up sentences with R’s regex-driven global substitute, gsub():
sentence = gsub(‘[[:punct:]]’, ”, sentence)
sentence = gsub(‘[[:cntrl:]]’, ”, sentence)
sentence = gsub(‘\\d+’, ”, sentence)
# and convert to lower case:
sentence = tolower(sentence)
# split into words. str_split is in the stringr package
word.list = str_split(sentence, ‘\\s+’)
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)
# compare our words to the dictionaries of positive & negative terms
pos.matches = match(words, pos.words)
neg.matches = match(words, neg.words)
# match() returns the position of the matched term or NA
# we just want a TRUE/FALSE:
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
score = sum(pos.matches) - sum(neg.matches)
return(score)
}, pos.words, neg.words, .progress=.progress)
scores.df = data.frame(score=scores, text=sentences)
return(scores.df)
}
--
View this message in context: http://r.789695.n4.nabble.com/Trouble-with-Functions-tp4632456.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list