[R] LDA on pre-assigned training and testing data sets
Michael Conklin
michael.conklin at markettools.com
Wed Jun 25 18:37:54 CEST 2008
I think this line
mafdiscpred <- predict(mafdisc, data = test)
needs to be
mafdiscpred <- predict(mafdisc, newdata = test)
Michael Conklin
Chief Methodologist - Advanced Analytics
MarketTools, Inc.
6465 Wayzata Blvd. Suite 170
Minneapolis, MN 55426
Tel: 952.417.4719 | Mobile:612.201.8978
Michael.Conklin at markettools.com
MarketTools(r) http://www.markettools.com
This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Peter Flom
Sent: Wednesday, June 25, 2008 11:22 AM
To: r-help at r-project.org
Subject: [R] LDA on pre-assigned training and testing data sets
Dear r-help
I am trying to run LDA on a training data set, and test it on another
data set with the same variables. I found examples using
crossvalidation, and using training and testing data sets set up with
sample, but not when they are preassigned.
Here is what I tried
# FIRST SET UP A DATAFRAME WITH ALL THE DATA AND CREATE NEW VARIABLES
traintest1 <- arnaudnognod1[arnaudnognod1$DISC_USE1 ==
1.01|arnaudnognod1$DISC_USE1 == 1.03|arnaudnognod1$DISC_USE1 == 1.04
|arnaudnognod1$DISC_USE1 == 1.02|arnaudnognod1$DISC_USE1 ==
1.05|arnaudnognod1$DISC_USE1 == 1.06,]
traintest1$normal <- traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1
== 1.03|traintest1$DISC_USE1 == 1.04
traintest1$mafelev <- apply(traintest1[,1:40], 1, FUN = mean)
traintest1$mafscatter <- apply(traintest1[,1:40], 1, FUN = sd)
# NEXT CREATE TRAINING AND TESTING DATAFRAMES
train <- traintest1[traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 ==
1.02,]
test <- traintest1[traintest1$DISC_USE1 > 1.02,]
# NOW, TRAIN HAS 400 ROWS, TEST HAS 396 ROWS, AND TRAINTEST1 HAS 796
ROWS, EACH HAS 615 COLUMNS, AS EXPECTED
# RUN DISCRIM ON TRAINING DATA
mafdisc <- lda(normal~mafelev + mafscatter, data = train)
#mafdisc$counts IS 210 AND 190, AS EXPECTED
#FINALLY, TEST IT ON THE TEST DATA
mafdiscpred <- predict(mafdisc, data = test)
#BUT mafdiscpred$class HAS LENGTH = 400, NOT 396, AS EXPECTED.
any help appreciated
thanks
Peter
Peter L. Flom, PhD
Brainscope, Inc.
212 263 7863 (MTW)
212 845 4485 (Th)
917 488 7176 (F)
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list