[R] twitteR and wordcloud()
Doran, Harold
HDoran at air.org
Thu Mar 12 00:47:19 CET 2015
I am trying to replicate the twitter and word cloud example found here
https://sites.google.com/site/miningtwitter/questions/talking-about/wordclouds/wordcloud1
When implemented verbatim, I replicate results and all works fine. But, when I make a slight modification to the code it fails in creating the tdm matrix. I found only one other question on this same topic at stack overflow with no answer leading to a solution.
Here is my code for a reproducible example, though you would need the twitteR tokens etc to run this on your own.
Any idea why the tdm step fails?
library(twitteR)
library(tm)
library(wordcloud)
library(RColorBrewer)
mach_tweets = searchTwitter("#machine", n=50, lang="en")
mach_text = sapply(mach_tweets, function(x) x$getText())
mach_corpus = Corpus(VectorSource(mach_text))
# create document term matrix applying some transformations
tdm = TermDocumentMatrix(mach_corpus,
control = list(removePunctuation = TRUE,
#stopwords = c(stopwords()),
removeNumbers = TRUE, tolower = TRUE))
# define tdm as matrix
m = as.matrix(tdm)
# get word counts in decreasing order
word_freqs = sort(rowSums(m), decreasing=TRUE)
# create a data frame with words and their frequencies
dm = data.frame(word=names(word_freqs), freq=word_freqs)
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2))
In fact, earlier today on a different computer than I am working on now, I wrote the following function and it works perfectly
tweets <- function(string, n, min){
tweets <- searchTwitter(as.character(string), n=n)
tweets_text <- sapply(tweets, function(x) x$getText())
tweets_text_corpus <- Corpus(VectorSource(tweets_text))
tweets_text_corpus <- tm_map(tweets_text_corpus, removePunctuation)
tweets_text_corpus <- tm_map(tweets_text_corpus, function(x)removeWords(x,stopwords()))
#wordcloud(tweets_text_corpus)
myDtm <- TermDocumentMatrix(tweets_text_corpus, control = list(minWordLength = 1))
m <- as.matrix(myDtm)
v <- sort(rowSums(m), decreasing=TRUE)
#wordcloud(names(v), v, scale = c(4,2), min.freq= min )
v
}
v <- tweets('#beer', n= 20)
But, when I run it on my Mac at home it also fails at the tdm step.
[[alternative HTML version deleted]]
More information about the R-help
mailing list