[R] Help to improve prediction from supervised mapping using kohonen package

Wed Jul 24 13:20:28 CEST 2013

On 24 July 2013 19:25, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be> wrote:
> Try rescaling your data prior to splitting it up into a training and test set. Otherwise you end up with two different ways of scaling.

Thanks, good point.
I have adjusted the code, however with no visible improvement.

Also, I want to be able to revert the scaling operation to compare
data to predicted values. I tried this:
meas.tc <- testing[, "MEAS_TC"] * attr(testing, 'scaled:scale') +
attr(testing, 'scaled:center')
predicted.tc <- tc.xyf.prediction$prediction * attr(testing,
'scaled:scale') + attr(testing, 'scaled:center')

but I get warnings, and values that are very wrong:

Warning messages:
1: In testing[, "MEAS_TC"] * attr(testing, "scaled:scale") :
  longer object length is not a multiple of shorter object length
2: In testing[, "MEAS_TC"] * attr(testing, "scaled:scale") + attr(testing,  :
  longer object length is not a multiple of shorter object length

Adjusted code:
(now also in a gist: https://gist.github.com/ottadini/6069736)

# ===== #
library(kohonen)

somdata <- read.csv("somdata.csv")

# Create SCALED test and training sets from data:
inTrain <- sample(nrow(somdata), nrow(somdata)*(2/3))
training <- scale(somdata[inTrain, ])
testing <- scale(somdata[-inTrain, ],
                 center = attr(training, "scaled:center"),
                 scale = attr(training, "scaled:scale"))

# Supervised kohonen map, where the dependent variable is MEAS_TC.
# Attempting to follow the examples in Wehrens and Buydens, 2007,
21(5), J Stat Soft.
# somdata[1] is the MEAS_TC variable
somX <- training[, -1]
somY <- training[, 1]
tc.xyf <- xyf(data=somX, Y=somY, xweight=0.5, grid=somgrid(6, 6,
"hexagonal"), contin=TRUE)

# Prediction with test set:
tc.xyf.prediction <- predict(tc.xyf, newdata=testing[, -1])

# Basic plot:
x <- seq(nrow(testing))
plot(x, testing[, "MEAS_TC"], type="l", col="black", ylim=c(-2, 2))
par(new=TRUE)
plot(x, tc.xyf.prediction$prediction, type="l", col="red", ylim=c(-2, 2))

# Still terrible. Do I have something wrong in scaling?

# ===== #