Dear Dr. Smyth,

Thanks for your reply. Here is the code sequence I used to remove the batch
effect and to make the PCA plot.

First,  getting raw counts for each feature per sample using htseq-count
(~64K features by using Ensemble.gtf)
Then, getting a count data.matrix by combing all samples together (64K
rows, and 14 columns). 8 samples from batch1, and 6 samples from batch2.

>count = cbind(count.s1, count.s2, ...., count.s14)

#remove the batch effect
library(edgeR)
batch = as.factor(c(rep("b1", 8), rep("b2", 6)))

logCPM <- cpm(count,log=TRUE,prior.count=5)
logCPM <- removeBatchEffect(logCPM, batch=batch)

#PCA
B.res = prcomp(logCPM, scale = TRUE)
pc.s = summary(.Bres)$importance[2,1:2]
pc1.var = round(pc.s[["PC1"]],2)
pc2.var = round(pc.s[["PC2"]],2)

pdf(file = "all.count.logCPM.rmBatch.pca")
plot(B.res$rotation[,1:2], main = maintitle,  xlab = paste("PC1 (variance:
",pc1.var*100, "%)", sep =""), ylab = paste("PC2 (variance: ",pc2.var*100,
"%)", sep =""))
dev.off()

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_3.4.0  limma_3.18.7


Many thanks.
Shirley




On Mon, Apr 28, 2014 at 6:31 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote:

>
>  Date: Sun, 27 Apr 2014 20:32:55 -0400
>> From: shirley zhang <shirley0818@gmail.com>
>> To: Gordon K Smyth <smyth@wehi.edu.au>
>> Cc: Bioconductor mailing list <bioconductor@r-project.org>
>> Subject: Re: [BioC] Remove batch effect in small RNASeq study (SVA or
>>         others?)
>>
>> Dear Dr. Smyth,
>>
>> Thank you very much for your quick reply. I did as you suggested by first
>> getting log CPM value, then call removeBatchEffect(). I found the PCA
>> figure looks better than before, but there is still a batch effect.
>>
>
> I don't see how there could still be a batch effect.
>
> Please give the code sequence you used to remove the batch effect and to
> make the PCA plot.
>
>  I attached two PCA figures. One is based on log10(raw count) which is
>> before calling cpm() and removeBatchEffect(). Another one is after.
>>
>> Could you look at them and give me more suggestions. Will a quantile
>> normalization across samples be a good idea since CPM() is still a
>> normalization only within each sample??
>>
>
> You should normalize the data before using removeBatchEffect(), and
> quantile is one possibility.
>
> Gordon
>
>  Thanks again for your help,
>> Shirley
>>
>>
>> On Sun, Apr 27, 2014 at 6:54 PM, Gordon K Smyth <smyth@wehi.edu.au>
>> wrote:
>>
>>  Dear Shirley,
>>>
>>> I would probably do it like this:
>>>
>>>   library(edgeR)
>>>   logCPM <- cpm(y,log=TRUE,prior.count=5)
>>>   logCPM <- removeBatchEffect(logCPM, batch=batch)
>>>
>>> Best wishes
>>> Gordon
>>>
>>>  Date: Sat, 26 Apr 2014 10:51:23 -0400
>>>
>>>> From: shirley zhang <shirley0818@gmail.com>
>>>> To: Bioconductor Mailing List <bioconductor@stat.math.ethz.ch>
>>>> Subject: [BioC] Remove batch effect in small RNASeq study (SVA or
>>>>         others?)
>>>>
>>>> I have a RNASeq paired-end data from two batches (8 samples from batch1,
>>>> and 7 samples from batch2). After alignment using TopHat2, then I got
>>>> count
>>>> using HTseq-count, and FPKM value via Cufflinks. A big batch effect can
>>>> be
>>>> viewed in PCA using both log10(raw count) and log10(FPKM) value.
>>>>
>>>> I can NOT use the block factor in edgeR to remove batch effect since I
>>>> need
>>>> to first obtain residuals after adjusting for batch effect, then test
>>>> the
>>>> residuals for hundreds of thousands of SNPs (eQTL analysis).
>>>>
>>>> My question is how to remove batch effect without using edgeR:
>>>>
>>>> 1. is SVA ok for such a small sample size (N=15)?
>>>> 2. If SVA does not work, any other suggestions?
>>>>
>>>> Many thanks,
>>>> Shirley
>>>>
>>>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:10}}

