[BioC] pamr Error: each class must have >1 sample

Thu Jul 29 02:06:35 CEST 2004

Hi Kasper,

Thank you so much for your help.  Your explanation has cleared up the cloud of fog I was sitting in.

Cheers,
Dick
*******************************************************************************
Richard P. Beyer, Ph.D.	University of Washington
Tel.:(206) 616 7378	Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696	4225 Roosevelt Way NE, # 100
			Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
*******************************************************************************

On Thu, 29 Jul 2004, Kasper Daniel Hansen wrote:

> Dick Beyer <dbeyer at u.washington.edu> writes:
> 
> > Hi Kasper,
> >
> > Thanks for pointing out my problem with pamr.train.  On closer examination, my problem seems slightly different than what I asked about earlier as it is occurring in pamr.cv. 
> >
> > Every class has 3 samples, so pamr.train is ok, but not pamr.cv:
> >  
> >>table(z)
> > z
> > 1 2 3 4 5 6 7 8 
> > 3 3 3 3 3 3 3 3 
> >>my.data  <- list(x=dendmat,y=factor(z))
> >>my.train <- pamr.train(my.data)
> > 123456789101112131415161718192021222324252627282930
> >> my.cv    <- pamr.cv(my.train, my.data)
> > Fold 1 :Error in nsc(x[, -folds[[ii]]], y = argy[-folds[[ii]]], x[, folds[[ii]],  : 
> >         Error: each class must have >1 sample
> >
> > Has anyone seen this in pamr.cv before?
> 
> Probably still the same problem. Even though your original sample was
> ok, when you do CV, each of the CV-train sets must have at least two
> sample in every category.
> 
> Eg. take a y-vector like
> 1,1,2,2,2,2
> 
> If you do 3 fold CV you must divide your set into 3 test-sets, eg. (if
> you do not do randomization)
> 1,1
> 2,2
> 2,2
> The corresponsing training sets would be
> 2,2,2,2
> 1,1,2,2
> 1,1,2,2
> so in this case you have a problem with the first train set as it does
> not contain more than 1 class. This is in principle only a problem on
> small sample sizes, but if you have (one or more) categories
> containing only a few samples you might run into this.
> 
> As far as I can ascertain, in your case it is doing 3-fold cv. This
> means that each test set is a sample of size 8 from your z
> vector. Unless you sample exactly one of each of the 8 categories,
> your will have the error. So you have way to few samples of each
> category... 1-fold cv would work though. But is it really possible to
> make good class predictions based on 3 samples of each class?
> 
> /Kasper
> 
> 
> 
> > *******************************************************************************
> > Richard P. Beyer, Ph.D.	University of Washington
> > Tel.:(206) 616 7378	Env. & Occ. Health Sci. , Box 354695
> > Fax: (206) 685 4696	4225 Roosevelt Way NE, # 100
> > 			Seattle, WA 98105-6099
> > http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
> > *******************************************************************************
> >
> > On Wed, 28 Jul 2004, Kasper Daniel Hansen wrote:
> >
> >> Dick Beyer <dbeyer at u.washington.edu> writes:
> >> 
> >> > I am having trouble with pamr.train and subsequently pamr.cv.
> >> >
> >> > In the pamr documentation, the following works:
> >> >
> >> >      set.seed(120)
> >> >      x <- matrix(rnorm(1000*20),ncol=20)
> >> >      y <- sample(c(1:4),size=20,replace=TRUE)
> >> >      mydata <- list(x=x,y=y)
> >> >      mytrain <-   pamr.train(mydata)
> >> >      mycv <- pamr.cv(mytrain,mydata)
> >> >
> >> > But if you change the seed, it doesn't:
> >> >
> >> >      set.seed(1123)
> >> >      x <- matrix(rnorm(1000*20),ncol=20)
> >> >      y <- sample(c(1:4),size=20,replace=TRUE)
> >> >      mydata <- list(x=x,y=y)
> >> >      mytrain <-   pamr.train(mydata)
> >> > Error in nsc(data$x[gene.subset, sample.subset], y = y, proby = proby,  : 
> >> >         Error: each class must have >1 sample
> >> >
> >> > There is discussion in the documents (http://www-stat.stanford.edu/~tibs/PAM/Rdist/doc/readme.html) about "fragile" functions, but I have not been able to understand how to make this error go away.  If anyone has had this problem or has some advice, I would be eternally grateful.
> >> 
> >> If you look at the y-ector you will notice it look like this
> >> > table(y)
> >> y
> >> 1 2 3 4
> >> 1 6 5 8
> >> 
> >> Hence there is only 1 sample with a class of "1". Of course this
> >> happens when you sample 20 times from a set of 4 values. From the error
> >> message it seems that the method requires at least two samples from
> >> every class. 
> >> 
> >> Possible solutions (quick solutions, I am not to familiar with pamr):
> >> - increase the size, so that a class with only one sample is very
> >> unlikely.
> >> - fit the data, disregarding the single sample and using only 3
> >> classes
> >> 
> >> /Kasper
> >> 
> >> -- 
> >> Kasper Daniel Hansen, Research Assistant
> >> Department of Biostatistics, University of Copenhagen
> >> 
> >
> >
> >
> >
> >
> >
> >
> 
> -- 
> Kasper Daniel Hansen, Research Assistant
> Department of Biostatistics, University of Copenhagen
>