[BioC] VSN: minimum number of controls?
Eric E. Snyder
esnyder at vbi.vt.edu
Fri Apr 2 23:41:44 CEST 2010
Hello,
In my first project with R and BioConductor, I am analyzing some small
microarrays, starting with variance normalization with vsn. Using
Wolfgang Huber's VSN.pdf tutorial I was able to do the exercise with the
"kidney" dataset without trouble. However, when trying to run:
> fit = vsn2( noDNAcontrols )
Error in .local(x, reference, strata, ...) :
One or more of the strata contain less than 42 elements.
Please reduce the number of strata so that there is enough in each stratum.
using my own data, I got the error above. I finally got around the
error by simulating a dataset containing 50 controls (my original data
had only 6). Surprisingly, even 42 controls was insufficient.
A collaborator, using the same dataset, was able to run vsn successfully
using an earlier version of R (2.9.0) and Bioconductor (version ?).
Is anyone familiar with this problem?
I see two ways forward:
1, Find the appropriate (old) version of Bioconductor and analyze with
the original controls.
2. Use the current R/Bioconductor releases and either find a software
patch or a work-around.
As for #2, maybe it is not unreasonable to use >42 controls on most
microarrays. However, this particular dataset is from a series of small
protein arrays (each probed with patient serum then visualized with
labeled anti-IgG) that contain only 214 antigens and 6 no DNA (meaning
"no protein") controls per patient (with a total 853 patients in the
dataset). Consequently, it is not possible to run a huge number of
controls, given the number of experimental cells per slide.
On a related note, in my effort to inflate the controls that I did have
into a sufficiently large number, I used "rnorm" to simulate/synthesize
the data. Here "noDNAstats" is a 2 x 853 matrix consisting of the mean
and standard deviation from the patients' noDNAcontrols in the first and
second rows, respectively.
i=1
noDNAsim50 = rnorm(50, noDNAstats[1,i], noDNAstats[2,i])
for(i in c( 2:ncol(noDNAstats) ) ){
noDNAsim50 = cbind(noDNAsim50, rnorm(50, noDNAstats[1,i],
noDNAstats[2,i]))
}
My understanding was that rnorm would create a dataset of the requested
size with the requested mean and SD. The numbers I get are in the same
ballpark but the means and SD are not the same. Am I missing something?
Thanks!
eesnyder
--
Eric E. Snyder, Ph.D.
Virginia Bioinformatics Institute
Virginia Polytechnic Institute and State University
Blacksburg, VA 24061-0447
USA
Email: eesnyder at vbi.vt.edu
Phone: (540) 231-5428
JDAM: N 37 13.248', W 80 25.551'
More information about the Bioconductor
mailing list