[BioC] use Combat to adjust for hidden variables without knowing batch effect

shirley zhang shirley0818 at gmail.com
Thu Jul 18 21:40:54 CEST 2013


Dear Dr. Johnson,

Many thanks for your suggestions. Yes, my data is rtPCR data ((200
genes x 2,000 samples).  Besides the number of genes, there are two
other difference of my data from microarray data.

1. In my qPCR data, if the expression level of a gene in a sample is >
30 cycle threshold (CT), the value is set to NA. In my expression data
matrix
(200 genes x 2,000 samples), there are lots of NA values. For example,
10% of genes have NA value in 50% of samples. Different genes might
have NA value in different samples.

2. My qPCR data contains negative value. The expression level of each
gene is first adjusted by
house-keeping gene which is repeatedly run in each plate/run. If the
expression level of the gene is higher than that of house-keeping
gene, in my data matrix, the value is negative.

I really appreciate your input.

Shirley

On Thu, Jul 18, 2013 at 2:47 PM, Johnson, William Evan <wej at bu.edu> wrote:
> Also, Tim Triche (at USC) just pointed out that your data may be PCR data. Is this correct? For PCR data, ComBat should work fine if you have a few hundred genes and there aren't any egregious outliers. However, I think that SVA requires a large number of genes (>1,000 or so)--but I'll let Jeff Leek confirm or refute this!
>
> Thanks!
>
> Evan
>
>
> On Jul 18, 2013, at 11:19 AM, W. Evan Johnson wrote:
>
>> Shirley,
>>
>> Michael has given you some good advice--definitely do these things.
>>
>> Also, one other thing to try is to apply SVA, and see if any of the surrogate variables seem to be correlated with your missing variables (maybe you have some idea which samples where collected together or at the same time?).
>>
>> Hope this helps,
>>
>> Evan
>>
>>
>>
>> On Jul 18, 2013, at 4:00 AM, <bioconductor-request at r-project.org>
>> wrote:
>>
>>> Hi Michael,
>>>
>>> Many thanks for your great suggestions. They are very helpful.
>>>
>>> Best,
>>> Shirley
>>>
>>> On Tue, Jul 16, 2013 at 11:56 PM, Michael Breen
>>> <breenbioinformatics at gmail.com> wrote:
>>>> Hi Shirley,
>>>>
>>>> It's often not recommended to batch correct without considerable evidence of
>>>> a batch effect. (i.e. date, cohorts etc..)
>>>>
>>>> What is recommended is to proceed with various sorts of quality assessment
>>>> to visualize potential batch effects. For example, we will often produce:
>>>>
>>>> -3D PCA plots wrapping 1, 2, 3, standard deviations around the data points
>>>> -Hierarchical clustering using pearsons correlation
>>>> (for each of these it helps to overlap a color scheme onto the potential
>>>> batches to aid in visualizing)
>>>> -Array to Array distance plots
>>>>
>>>> If you find no evidence of batches then skip the batch adjustment. If exists
>>>> a potential effect, correct with Combat or SCAN and proceed with your
>>>> analysis.
>>>>
>>>> Good luck,
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Mon, Jul 15, 2013 at 6:10 PM, shirley zhang <shirley0818 at gmail.com>
>>>> wrote:
>>>>>
>>>>> I know if the batch effect is known. We can use Combat to adjust for
>>>>> the batch effect.  However, if the batch effect is unknown, could I
>>>>> still use Combat or SVA to adjust for some hidden variables? We know
>>>>> that our blood samples  were NOT
>>>>> drawn at the same time from individuals, and RNA were NOT extracted at
>>>>> the same time.
>>>>>
>>>>> Many thanks,
>>>>> Shirley
>>>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list