[BioC] batch effects 450K
Brent Pedersen
bpederse at gmail.com
Fri Jun 8 19:58:19 CEST 2012
On Fri, Jun 8, 2012 at 8:44 AM, Femke [guest] <guest at bioconductor.org> wrote:
>
> Dear All,
>
> I have Infinium 450K data for 56 breast cancer tumors. As a first analysis I wanted to do a clustering and see the distribution of the samples. For this I used the minfi package. Unfortunately, the assays were done in 2 batches and there is a clear batch effect. I looked into Combat and SVA to remove the batch effect. As far as I understand, to use these approaches I need to have a phenotype/variable of interest. In the tutorial ("The SVA package for removing batch effects and other unwanted variation in high-throughput experiments … Modified: October 24, 2011 Compiled: April 25, 2012") the variable of interest is cancer status. However, I do not have normals. Does anyone have suggestions on how I should tackle these batch effects?
>
> Many thanks in advance and all the best!
>
> Femke
>
>
> -- output of sessionInfo():
>
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] bladderbatch_1.0.3
> [2] sva_3.2.1
> [3] mgcv_1.7-17
> [4] corpcor_1.6.3
> [5] IlluminaHumanMethylation450kmanifest_0.2.1
> [6] gplots_2.10.1
> [7] KernSmooth_2.23-7
> [8] caTools_1.13
> [9] bitops_1.0-4.1
> [10] gdata_2.8.2
> [11] gtools_2.6.2
> [12] minfi_1.2.0
> [13] GenomicRanges_1.8.6
> [14] IRanges_1.14.3
> [15] reshape_0.8.4
> [16] plyr_1.7.1
> [17] lattice_0.20-6
> [18] Biobase_2.16.0
> [19] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.18.1 BiocInstaller_1.4.4 Biostrings_2.24.1
> [4] DBI_0.2-5 MASS_7.3-18 Matrix_1.0-6
> [7] R.methodsS3_1.2.2 RColorBrewer_1.0-5 RSQLite_0.11.1
> [10] affyio_1.24.0 annotate_1.34.0 beanplot_1.1
> [13] bit_1.1-8 codetools_0.2-8 crlmm_1.14.0
> [16] ellipse_0.3-7 ff_2.2-7 foreach_1.4.0
> [19] genefilter_1.38.0 iterators_1.0.6 limma_3.12.0
> [22] matrixStats_0.5.0 mclust_3.4.11 multtest_2.12.0
> [25] mvtnorm_0.9-9992 nlme_3.1-104 nor1mix_1.1-3
> [28] oligoClasses_1.18.0 preprocessCore_1.18.0 siggenes_1.30.0
> [31] splines_2.15.0 stats4_2.15.0 survival_2.36-14
> [34] xtable_1.7-0 zlibbioc_1.2.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Since the batch is known, why not just include it in your model and
run with limma or lm()?
But what's your study-design if you don't have controls?
More information about the Bioconductor
mailing list