[R] Rare Cases and SOM

Gabor Grothendieck ggrothendieck at myway.com
Sat Feb 5 14:20:26 CET 2005


Manuel Gutierrez <manuel_gutierrez_lopez <at> yahoo.es> writes:

: 
: I am trying to understand how the SOM algorithm works
: using library(class) SOM function.
: I have a 1000*10 matrix and I want to be able to
: summarize the different types of 10-element vectors.
: In my real world case it is likely that most of the
: 1000 values are of one kind the rest of other (this is
: an oversimplification).
: Say for example:
: 
: InputA<-matrix(cos(1:10),nrow=900,ncol=10,byrow=TRUE)
: InputB<-matrix(sin(5:14),nrow=100,ncol=10,byrow=TRUE)
: Input<-rbind(InputA,InputB)
: 
: I though that a small grid of 3*3 would be enough to
: extract the patterns in such simple matrix :
: GridWidth<-3
: GridLength<-3
: gr <- somgrid(xdim=GridWidth,ydim=GridLength,topo =
: "hexagonal")
: test.som <- SOM(Input, gr)
: par(mfrow=c(GridLength,GridWidth))
: for(i in 1:(GridWidth*GridLength))
: plot(test.som$codes[i,],type="l")
: 
: Only when I use a larger grid (say for example 7*3 ) I
: get some of the representatives for the sin pattern.
: This must have something to do with the initialization
: of the grid, as the sin is so rare it is unlikely that
: I get it as a reference vector. Afterwards, because
: the selection for the training is also random it is
: also unlikely they are picked.
: I've been trying to modify some of the other
: parameters for the SOM also, but I would appreciatte
: some input to keep me going until I receive the
: reference books from my bookstore.
: 
: Are my suspictions right?
: Should I be using the SOM for my study or should I
: look somewhere else?
: NOTE: I have no prior knowledge of whether the
: datasets I want to analyse will have rare cases or not
: or where they will be located.

I don't have a direct answer to your question as I have not
used that package but I have used randomForest and it does have 
stratified sampling facitilities so that you can be sure that a rare case
is represented.  Check out the sampsize= argument.  Also there
is an article in RNews on randomForest and search this list where
you can find some relevant comments by the author of randomForest.




More information about the R-help mailing list