[BioC] How do I remove bad samples/probes before normalization and SWAN?
Qin [guest]
guest at bioconductor.org
Tue Dec 10 20:45:07 CET 2013
Hi there,
I am learning minfi using a dataset containing 24 samples. I know there are 2 QC samples, 2 duplicated samples, and one bad sample I determined by minfi. My question is: what is the proper procedure to remove these samples from the data? Should I remove these sample file names from the sample sheet, and re-build the RGSet again? Similar question goes to probes identified to have detection p-values higher than 0.01, and CpGs in Chromosome X & Y. I think these CpGs should be excluded before doing normalization and SWAN, but I really donât know how. One thing I have tried is to remove those probes (and also the 5 samples I want to remove) from MSet.raw, and then use this reduced MSet.raw.reduced to do SWAN:
MSet.swan<-preprocessSWAN(RGSet, mSet= MSet.raw.reduced)
Here RGSet is still the original one with 24 samples and all 485512 probs, but MSet.raw.reduced has only 19 samples and about 470K CpGs. The MSet.swan I got has same dimensions as MSet.raw.reduced, but I donât know if this method is valid or not. I do know this cannot be applied to get MSet.norm. If this is not a valid method, what is the correct way to do it?
I really appreciate your help and wish you a happy holiday season!
Qin
-- output of sessionInfo():
R version 2.15.2 (2012-10-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list