[BioC] How do I remove bad samples/probes before normalization and SWAN?

Tue Dec 10 20:45:07 CET 2013

Hi there,

I am learning minfi using a dataset containing 24 samples. I know there are 2 QC samples, 2 duplicated samples, and one bad sample I determined by minfi. My question is: what is the proper procedure to remove these samples from the data? Should I remove these sample file names from the sample sheet, and re-build the RGSet again? Similar question goes to probes identified to have detection p-values higher than 0.01, and CpGs in Chromosome X & Y. I think these CpGs should be excluded before doing normalization and SWAN, but I really donâ€™t know how. One thing I have tried is to remove those probes (and also the 5 samples I want to remove) from MSet.raw, and then use this reduced MSet.raw.reduced to do SWAN: 

MSet.swan<-preprocessSWAN(RGSet, mSet= MSet.raw.reduced) 

Here RGSet is still the original one with 24 samples and all 485512 probs, but MSet.raw.reduced has only 19 samples and about 470K CpGs. The MSet.swan I got has same dimensions as MSet.raw.reduced, but I donâ€™t know if this method is valid or not. I do know this cannot be applied to get MSet.norm.  If this is not a valid method, what is the correct way to do it? 

I really appreciate your help and wish you a happy holiday season!

Qin

 -- output of sessionInfo(): 

R version 2.15.2 (2012-10-26)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

--
Sent via the guest posting facility at bioconductor.org.