[R] Help to improve prediction from supervised mapping using kohonen package
Ben Harrison
harb at student.unimelb.edu.au
Wed Jul 24 11:05:26 CEST 2013
I would really like some or any advice on how I can improve (or fix??)
the following analysis. I hope I have provided a completely runnable
code - it doesn't produce any errors for me.
The resulting plot at the end shows a pretty poor correlation (just
speaking visually here) to the test set. How can I improve the
performance of the mapping and prediction?
Here are some of the data (continuous, numerical):
> head(somdata)
MEAS_TC SP LN SN GR NEUT
1 2.780000 59.181090 33.74364 19.75361 66.57665 257.0368
2 1.490000 49.047750 184.14598 139.07980 54.75052 326.8001
3 1.490000 49.128902 183.58853 138.02768 55.54114 327.4739
4 2.201276 18.240331 19.20386 10.74748 62.04492 494.4161
5 2.201276 18.215522 19.18009 10.72446 61.87448 494.7409
6 1.276476 9.337769 14.16061 19.06902 14.99612 363.0020
Complete data set is at the following link if you fancy it:
https://gist.github.com/ottadini/6068259
The first variable is the dependent. I wish to train a som using this
data, and then be able to predict MEAS_TC using a new set of data with
missing values of MEAS_TC. Below I'm simply splitting the somdata into
a training and a testing set for evaluation purposes.
# ===== #
library(kohonen)
somdata <- read.csv("somdata.csv")
# Create test and training sets from data:
inTrain <- sample(nrow(somdata), nrow(somdata)*(2/3))
training <- somdata[inTrain, ]
testing <- somdata[-inTrain, ]
# Supervised kohonen map, where the dependent variable is MEAS_TC.
# Attempting to follow the examples in Wehrens and Buydens, 2007,
21(5), J Stat Soft.
# somdata[1] is the MEAS_TC variable
somX <- scale(training[-1])
somY <- training[[1]] # Needs to return a vector
# Train the map (not sure this is how it should be done):
tc.xyf <- xyf(data=somX, Y=somY, xweight=0.5, grid=somgrid(6, 6,
"hexagonal"), contin=TRUE)
# Prediction with test set:
tc.xyf.prediction <- predict(tc.xyf, newdata = scale(testing[-1]))
# Basic plot:
x <- seq(nrow(testing))
plot(x, testing[, "MEAS_TC"], type="l", col="black", ylim=c(0, 3.5))
par(new=TRUE)
plot(x, tc.xyf.prediction$prediction, type="l", col="red", ylim=c(0, 3.5))
# Wow, that's terrible. Do I have something wrong?
# ===== #
More information about the R-help
mailing list