[R] Fwd: Imputation with SOM using kohonen package

Ben Harrison harb at student.unimelb.edu.au
Tue Apr 16 00:42:35 CEST 2013

Trying re-send as plain text.

I have a data set with 10 variables, and about 8000 instances (or
objects/rows/samples). In addition I have one more ('class') variable
that I have about 10 instances for, but for which I wish to impute
values for.

I am a little confused how to go about doing this, mostly as I'm not
well-versed in it. Do I train the SOM with a data object that contains
just the first 10 variables (exclude the 'class' variable), then
predict using an object that has all of the variables (including the
class variable)?

(I am using the kohonen package, and in general I am using the SOM
technique as a comparison to some other methods).

I don't know if providing some or all data is useful, please let me
know if you think it is.

# get the data
bw <- read.csv("bw.csv")
# some missing values in data
bwm <- data.frame(na.approx(bw, na.rm=FALSE, rule=2))
bw10 <- bwm[, 1:10]
bw10.sc <- scale(bw10)
bw.som <- som(data=bw10.sc, grid=somgrid(25,20,'hexagonal'))  #
playing with diff grid sizes

# the different plots of the som at this point show some interesting
features to me, but are quite difficult to interpret.
# there's much work needed here to understand it, but for now I want
to see if it's possible to impute values for another variable...

# here's where I lose it, missing values, trainY, don't get it.
bw.predict <- predict(bw.som, newdata=scale(bw), trainX=???, trainY=???)


More information about the R-help mailing list