[BioC] multtest different seed

Fri Sep 19 10:29:25 CEST 2008

Hi Vincent,

many thanks for your explanation. You convinced me, even if I must say that I really found a different behaviour of the code between the two versions that I reported in my previous message. 

I also (erroneously) thought that setting 'standardize=FALSE' in the MTP function could avoid calculations which fail when the variance is zero.

However, in the future I will try to adopt the 'dominant strategy' to take my software updated. Believe me, in my work I really take into consideration what John von Neumann wrote :-)

Best,

Stefano

________________________________

From: Vincent Carey 525-2265 [mailto:stvjc at channing.harvard.edu]
Sent: Thu 18/09/2008 17:12
To: Stefano Moretti
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] multtest different seed

> Hi,
>
> I have a problem with the multtest package.
> Using the following code, for a very simple example, on R version 2.4.0 with multtest version 1.12.0 I didn't get any problem.
>
> library(multtest)
> R<- matrix(0, 2, 6)
> R[1,]<- c(0,0,0.3,0.2,0,0.16)
> R[2,]<- c(0, 0.2, 0,0.2,0.3,0)
> classes<- c(1,1,1,2,2,2)
> OUTPUT <- MTP(X=R, Y=classes, standardize=FALSE, B=1000)
>
> But now, with the R version 2.7.2 with multtest version 1.21.1 I get the following error message
> running bootstrap...
> iteration = Error in function (x, w = NULL, samp = Samp)  :
>   Only one unique value in bootstrap sample for second group. Cannot calculate variance. This problem may be resolved if you try again with a different seed.
>
>
> Is there anyone who can help me to understand what I do wrong?

This should have nothing to do with the version of R or the package.  Bootstrapping is a procedure
that involves computing test statistics repetitively over samples taken with replacement from the
original data.  If a given sample contains enough copies of a given observation (owing to the
sampling with replacement), calculations that require nonuniqueness will fail.

It is true that changing the seed will change the collection of samples that your procedure
encounters, and _may_ thus avoid a sample with "too many copies", and calculations requiring
uniqueness (or dispersion) of values will succeed.

This problem is occurring in this case because your base sample size is so small, so that the
probability of an underdispersed sample is relatively high, and because some computation fails
when variance is zero.  Avoiding the problem by changing the seed seems to me to be an unwise
approach, but as von Neumann wrote, "Any one who considers arithmetical methods of producing
random digits is, of course, in a state of sin".  (See Knuth TAOCP v2.)

>
> Many thanks for your attention.
> Best,
> Stefano
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>