[R] My code needs improvement

Vassiliki Marinou vassiliki.marinou at gmail.com
Mon Dec 8 20:09:14 CET 2014


Hello!

I am performing a sentiment analysis of 2.000 negative and positive reviews.
I think my code needs improvement because I am getting accuracy 68 % and
the running duration of the code is 20 minutes!! Please find below a part
of the code.

"
# Read data from their directories
pos <- Corpus(DirSource(pos_dir), readerControl=list(language="english",
reader=readPlain))
neg <- Corpus(DirSource(neg_dir), readerControl=list(language="english",
reader=readPlain))
# Create training and testing corpuses
print("Creating training and testing corpuses...")
split.percentage      <- 0.75
split.pos.size        <- length(pos)
split.neg.size        <- length(neg)
split.pos.train.size  <- floor(split.pos.size * split.percentage)
split.neg.train.size  <- floor(split.neg.size * split.percentage)
split.pos.test.size   <- split.pos.size - split.pos.train.size
split.neg.test.size   <- split.neg.size - split.neg.train.size
corpus.train          <- c(pos[1:split.pos.train.size],
neg[1:split.neg.train.size])
corpus.test           <- c(pos[(split.pos.train.size + 1) :
split.pos.size],
                           neg[(split.neg.train.size + 1) : split.neg.size])
# Perform  preprocessing
print("Pre-processing corpuses...")
corpus.train <- preProcess(corpus.train)
corpus.test  <- preProcess(corpus.test)
# Create the Document Term Matrix
print("Creating document term matrices...")
corpus.train.dtm <- DocumentTermMatrix(corpus.train,
control=list(minWordLength = 2))
corpus.test.dtm  <- DocumentTermMatrix(corpus.test,
control=list(minWordLength = 2))
# Create the Data Frame
print("Creating data matrices...")
corpus.train.df <- as.matrix(corpus.train.dtm)
corpus.test.df  <- as.matrix(corpus.test.dtm)
# Generate vector with class values
print("Creating class information...")
class.train <- c(rep("pos", split.pos.train.size), rep("neg",
split.neg.train.size))
class.test  <- c(rep("pos", split.pos.test.size), rep("neg",
split.neg.test.size))
# Train classifier
print("Training classifier...")
classifier <- naiveBayes(corpus.train.df, as.factor(class.train))
# Evaluate Classifier
print("Evaluating... Please be patient. This will take a while...")
corpus.predictions <- predict(classifier, corpus.test.df)
table(corpus.predictions, class.test)
"
I could use some ideas.

Thank you for your time.

V.

	[[alternative HTML version deleted]]



More information about the R-help mailing list