[BioC] batch effects 450K

Fri Jun 8 19:58:19 CEST 2012

On Fri, Jun 8, 2012 at 8:44 AM, Femke [guest] <guest at bioconductor.org> wrote:
>
> Dear All,
>
> I have Infinium 450K data for 56 breast cancer tumors. As a first analysis I wanted to do a clustering and see the distribution of the samples. For this I used the minfi package. Unfortunately, the assays were done in 2 batches and there is a clear batch effect. I looked into Combat and SVA to remove the batch effect. As far as I understand, to use these approaches I need to have a phenotype/variable of interest. In the tutorial ("The SVA package for removing batch effects and other unwanted variation in high-throughput experiments â€¦ Modified: October 24, 2011 Compiled: April 25, 2012") the variable of interest is cancer status. However, I do not have normals. Does anyone have suggestions on how I should tackle these batch effects?
>
> Many thanks in advance and all the best!
>
> Femke
>
>
>  -- output of sessionInfo():
>
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>  [1] bladderbatch_1.0.3
>  [2] sva_3.2.1
>  [3] mgcv_1.7-17
>  [4] corpcor_1.6.3
>  [5] IlluminaHumanMethylation450kmanifest_0.2.1
>  [6] gplots_2.10.1
>  [7] KernSmooth_2.23-7
>  [8] caTools_1.13
>  [9] bitops_1.0-4.1
> [10] gdata_2.8.2
> [11] gtools_2.6.2
> [12] minfi_1.2.0
> [13] GenomicRanges_1.8.6
> [14] IRanges_1.14.3
> [15] reshape_0.8.4
> [16] plyr_1.7.1
> [17] lattice_0.20-6
> [18] Biobase_2.16.0
> [19] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
>  [1] AnnotationDbi_1.18.1  BiocInstaller_1.4.4   Biostrings_2.24.1
>  [4] DBI_0.2-5             MASS_7.3-18           Matrix_1.0-6
>  [7] R.methodsS3_1.2.2     RColorBrewer_1.0-5    RSQLite_0.11.1
> [10] affyio_1.24.0         annotate_1.34.0       beanplot_1.1
> [13] bit_1.1-8             codetools_0.2-8       crlmm_1.14.0
> [16] ellipse_0.3-7         ff_2.2-7              foreach_1.4.0
> [19] genefilter_1.38.0     iterators_1.0.6       limma_3.12.0
> [22] matrixStats_0.5.0     mclust_3.4.11         multtest_2.12.0
> [25] mvtnorm_0.9-9992      nlme_3.1-104          nor1mix_1.1-3
> [28] oligoClasses_1.18.0   preprocessCore_1.18.0 siggenes_1.30.0
> [31] splines_2.15.0        stats4_2.15.0         survival_2.36-14
> [34] xtable_1.7-0          zlibbioc_1.2.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Since the batch is known,  why not just include it in your model and
run with limma or lm()?
But what's your study-design if you don't have controls?