[BioC] Adjusting removeBatchEffect from limma
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Feb 5 02:18:52 CET 2012
Dear Djie,
I have generalized the removeBatchEffect() function so that it will now
accept continuous covariates as well as batch factors. Just enter BSCE as
the 'covariates' argument. The updated function will be on the devel
version of limma, and I have attached it.
Best wishes
Gordon
> Date: Tue, 31 Jan 2012 13:08:09 +0100
> From: Djie Tjwan Thung <djie.thung at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] Adjusting removeBatchEffect from limma
>
> Dear Gordon Smyth, bioconductor list,
>
> In the limma package we found a function called, removeBatchEffect, which
> removes the effects of batch effects or other technical variables on a gene
> expression matrix. The code of this removeBatchEffect is as follows:
>
> function (x, batch, batch2 = NULL, design = matrix(1, ncol(x), 1)) {
> x <- as.matrix(x)
> batch <- as.factor(batch)
> contrasts(batch) <- contr.sum(levels(batch))
> if (is.null(batch2)) {
> X <- model.matrix(~batch)[, -1, drop = FALSE]
> }
> else {
> batch2 <- as.factor(batch2)
> contrasts(batch2) <- contr.sum(levels(batch2))
> X <- model.matrix(~batch + batch2)[, -1, drop = FALSE]
> }
> X <- qr.resid(qr(design), X)
> t(qr.resid(qr(X), t(x)))
> }
>
> Now we have an expression matrix coming from a methylation array. As
> bisulfite conversion efficiency (BSCE) is a known confounder for this type
> of array, wed like to adjust the matrix for the confounding effects of
> BSCE. BSCE is a numerical variable, however the removeBatchEffect expects a
> categorical variable as it handles the batch variable as a factor.
>
> We were wondering if its valid to remove the following lines of code in
> this function regarding, batch and batch2, so the function treats these
> variables as numerical variables:
>
> batch <- as.factor(batch)
>
> contrasts(batch) <- contr.sum(levels(batch))
>
>
>
> If someone is willing, you can test out this altered version by loading the
> .RData in the following link:
> http://good.net/dl/au/fa43/contTechnicalVar.RData and using the following
> demo script:
>
> ###
>
> #removeNumericBatchEffect(): altered version of removeBatchEffect(),
> with the as.factor() and contrasts() lines commented out
>
> #mod: model matrix containing variables of interest (intercept, age and gender)
>
> # head(mod) will give:
>
> # (Intercept) as.factor(pheno$gender)v pheno$age
>
> #1 1 1 55.9
>
> #2 1 1 32.0
>
> #create random intensity values for bisulfite conversion
>
> bisulfite.conversion <- sample(6000:10000, 95)
>
> #Run adjusted removeBatchEffect function
>
> removeNumericBatchEffect(expr, batch = bisulfite.conversion, design = mod)
>
>
>
> Wed appreciate any input. Weve also considered ComBat from the sva
> package as an alternative, but as this function also only accepts
> categorical variables, this limma function seems like a better option to
> modify its code for our purpose.
>
>
>
> Regards,
>
>
>
> Djie Thung
>
> Bioinformatics intern at UMC Utrecht
>
> Dept. of Medical Genetics
______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________
-------------- next part --------------
# removeBatchEffect.R
# A refinement would be to empirical Bayes shrink
# the batch effects before subtracting them.
removeBatchEffect <- function(x,batch=NULL,batch2=NULL,covariates=NULL,design=matrix(1,ncol(x),1))
# Remove batch effects from matrix of expression data
# Gordon Smyth and Carolyn de Graaf
# Created 1 Aug 2008. Last revised 4 Feb 2012.
{
x <- as.matrix(x)
if(is.null(batch) && is.null(batch2) && is.null(covariates)) return(x)
if(!is.null(batch)) {
batch <- as.factor(batch)
contrasts(batch) <- contr.sum(levels(batch))
batch <- model.matrix(~batch)[,-1,drop=FALSE]
}
if(!is.null(batch2)) {
batch2 <- as.factor(batch2)
contrasts(batch2) <- contr.sum(levels(batch2))
batch2 <- model.matrix(~batch2)[,-1,drop=FALSE]
}
if(!is.null(covariates)) covariates <- as.matrix(covariates)
X <- cbind(batch,batch2,covariates)
X <- qr.resid(qr(design),X)
t(qr.resid(qr(X),t(x)))
}
More information about the Bioconductor
mailing list