[BioC] Advice with RemoveBatchEffect and Rank Product package

Sanogo, Yibayiri O sanogo at illinois.edu
Tue Sep 11 04:51:39 CEST 2012


Dear Gordon,

Please ignore the last two columns (weight and length)  of the design for now:  I think they are screwed up due to some kind of sorting. I will have to double check them (look in the notebook which is not here now). Only different brain regions (T, D, C, BS) belonging to the same fish should have same weight, length.

The other columns are:

Fish.Number= corresponds to fish ID, the individual fish (biological replicates)
Slide= the Agilent 4X44k slides on each the arrays were located
Brain.Part= the different brain regions that were dissected and hybridized (within region). They were T (telencephalon), D (diencephalon), C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control and "1" is experimental.
 
FileName	Cy3	Cy5	Fish Number	Slide	Brain Part
1T.gpr	1	-1	1	2	T
2T.gpr	-1	1	2	1	T
3T.gpr	1	-1	3	4	T
4T.gpr	-1	1	4	3	T
5T.gpr	1	-1	5	6	T
6T.gpr	-1	1	6	5	T
1D.gpr	-1	1	1	5	D
2D.gpr	1	-1	2	4	D
3D.gpr	-1	1	3	1	D
4D.gpr	1	-1	4	6	D
5D.gpr	-1	1	5	3	D
6D.gpr	1	-1	6	2	D
1C.gpr	1	-1	1	4	C
2C.gpr	-1	1	2	3	C
3C.gpr	1	-1	3	6	C
4C.gpr	-1	1	4	5	C
5C.gpr	1	-1	5	2	C
6C.gpr	-1	1	6	1	C
1BS.gpr	-1	1	1	1	BS
2BS.gpr	1	-1	2	2	BS
3BS.gpr	-1	1	3	3	BS
4BS.gpr	1	-1	4	4	BS
6BS.gpr	1	-1	6	6	BS

Thank you for your willingness to help. I will stay up a bit so that I can respond faster to your questions given the time difference between us.

Thank you so much.

Osee

________________________________________
From: Gordon K Smyth [smyth at wehi.EDU.AU]
Sent: Monday, September 10, 2012 6:34 PM
To: Osee Sanogo
Cc: Sanogo, Yibayiri O; Bioconductor mailing list
Subject: Re: Advice with RemoveBatchEffect and Rank Product package

Dear Osee,

Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight
and Length mean in your experiment?  If the first six arrays are from
biologically independent samples, why do they have the same weight and
length?

Best wishes
Gordon


On Mon, 10 Sep 2012, Osee Sanogo wrote:

> Dear Gordon,
>
> From what you said, it seems that I am oversimplifying my experiment by
> attempting to analyze it with RankProd, which doesn't offer the option for
> complex modeling.
>
> Could you please explain to me how I could analyze the experiment using
> Limma?
> Please let me know if you'd like me to provide further details of the
> experiment.
>
> Thank you so much.
>
> Osee
>
>
> On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at wehi.EDU.AU> wrote:
>
>> Dear Osee,
>>
>> No, you can't use removeBatchEffect to control for dye bias.
>>
>> Can you ignore the dye effect?  Not in general, but who knows?
>>
>> Your experiment seems too complex to be properly analysed using RankProd.
>> For one thing, it seems clear that you have obtained multiple parts of the
>> brain from the same biological replicates, meaning that your samples are
>> paired by fish number.
>>
>> I could explain how to analyse this experiment using limma.  However, if
>> you are determined that you will use RankProd, it might be best to email
>> the authors of that package for advice.
>>
>> Best wishes
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> http://www.statsci.org/smyth
>>
>> On Sun, 9 Sep 2012, Osee Sanogo wrote:
>>
>>> Dear Gordon,
>>>
>>> Thank you for getting back to me about my questions.
>>>
>>> My experiment is trying to identify differentially expressed genes in four
>>> regions of the brain in response to a stressor. I have 6 biol. replicates in
>>> each brain region for the control and experimental groups in each region,
>>> and the comparison is being done within brain region (i.e., T control vs T
>>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were
>>> run in two-color Agilent Array.
>>>
>>> You're right that the design I sent was from the separate channel analysis,
>>> in which I attempted to account for array and dye effect, and then run the
>>> data in RankProd. Now I know that this is not right. Ok, I will use the
>>> single channel analysis in Limma.
>>>
>>> I still would like to run the two-channel data (ratios) in RankProd, as my
>>> previous experience found this useful for my dara (low replicate numbers).
>>>
>>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias
>>> before running the two-channel data in RankProd? If yes, how should I do
>>> this using the RemoveBatch Effect function?
>>>            2) I found that about 3% of my probes have dye effect. Can I
>>> omit controlling for dye effect without compromising the result of my
>>> analysis?
>>>
>>> The data were loess/scale normalized into an expression set (Data_RP).
>>>
>>> Here is the design of the experiment
>>>
>>> FileName     Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length
>>> 1    1T.gpr   1  -1           1     2          T     39   0.63
>>> 2    2T.gpr  -1   1           2     1          T     39   0.63
>>> 3    3T.gpr   1  -1           3     4          T     39   0.63
>>> 4    4T.gpr  -1   1           4     3          T     39   0.63
>>> 5    5T.gpr   1  -1           5     6          T     39   0.63
>>> 6    6T.gpr  -1   1           6     5          T     NA     NA
>>> 7    1D.gpr  -1   1           1     5          D     47   1.21
>>> 8    2D.gpr   1  -1           2     4          D     47   1.21
>>> 9    3D.gpr  -1   1           3     1          D     47   1.21
>>> 10   4D.gpr   1  -1           4     6          D     47   1.21
>>> 11   5D.gpr  -1   1           5     3          D     47   1.21
>>> 12   6D.gpr   1  -1           6     2          D     NA     NA
>>> 13   1C.gpr   1  -1           1     4          C     47   1.31
>>> 14   2C.gpr  -1   1           2     3          C     47   1.31
>>> 15   3C.gpr   1  -1           3     6          C     47   1.31
>>> 16   4C.gpr  -1   1           4     5          C     47   1.31
>>> 17   5C.gpr   1  -1           5     2          C     47   1.31
>>> 18   6C.gpr  -1   1           6     1          C     NA     NA
>>> 19  1BS.gpr  -1   1           1     1         BS     89   1.44
>>> 20  2BS.gpr   1  -1           2     2         BS     89   1.44
>>> 21  3BS.gpr  -1   1           3     3         BS     89   1.44
>>> 22  4BS.gpr   1  -1           4     4         BS     NA     NA
>>> 23  5BS.gpr  -1   1           5     5         BS     NA     NA
>>> 24  6BS.gpr   1  -1           6     6         BS     NA     NA
>>>
>>> Thank you for your help and please let me know if you need further
>>> explanation of the experiment.
>>>
>>> Best regards,
>>>
>>> Osee
>>>
>>>>
>>>
>>>
>>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at wehi.EDU.AU> wrote:
>>>
>>>> Dear Osee,
>>>>
>>>> You are attempting to do a number of things that don't seem correct to me.
>>>>
>>>> First, you seem to attempting a separate channel analysis of two color
>>>> microarray data, but ignoring the pairing of the red and green channels.
>>>> It isn't correct to do this.  I don't see any way to use RankProd, or any
>>>> other package designed for independent samples, correctly in this context.
>>>> If you must do a separate channel analysis, you would be better off using
>>>> the separate channel analysis facilities of the limma package.
>>>>
>>>> Second, when you set batch=rep(1,24), you are defining a batch that
>>>> consists of every array in your data set.  Obviously it doesn't make sense
>>>> to remove batch effects unless there are at least two batches.
>>>>
>>>> Third, I don't follow your design matrix.
>>>>
>>>> It would be better to go back to the start, and describe in more basic
>>>> terms what is the nature of your data and what comparison you are trying
>>>> to make.
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000
>>>>> From: "Sanogo, Yibayiri O" <sanogo at illinois.edu>
>>>>> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
>>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package
>>>>>
>>>>> Dear Members of the list,
>>>>>
>>>>> (I apologize for posting this again -I sent it earlier to the list but
>>>>> from another account and I was listed me as non-Member -and Member I am
>>>>> since 2008:-)).
>>>>>
>>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I
>>>>> would like to remove the dye effect prior to analysis. I read on the forum
>>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of
>>>>> modeling that is not in Rank Product.
>>>>>
>>>>> I have two questions:
>>>>>
>>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye
>>>>> effect prior to running the expression data using Rank Prod?
>>>>>
>>>>> 2) a) If no, what other function could I use to do this?
>>>>>   b) If yes, I would like a help with the correct design and how to
>>>>> properly indicate the batch.
>>>>>
>>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are
>>>>> different areas of the brain):
>>>>>
>>>>> design1
>>>>>   BS  C  D  T
>>>>> 1   0  0  0  1
>>>>> 2   0  0  0 -1
>>>>> 3   0  0  0  1
>>>>> 4   0  0  0 -1
>>>>> 5   0  0  0  1
>>>>> 6   0  0  0 -1
>>>>> 7   0  0 -1  0
>>>>> 8   0  0  1  0
>>>>> 9   0  0 -1  0
>>>>> 10  0  0  1  0
>>>>> 11  0  0 -1  0
>>>>> 12  0  0  1  0
>>>>> 13  0  1  0  0
>>>>> 14  0 -1  0  0
>>>>> 15  0  1  0  0
>>>>> 16  0 -1  0  0
>>>>> 17  0  1  0  0
>>>>> 18  0 -1  0  0
>>>>> 19 -1  0  0  0
>>>>> 20  1  0  0  0
>>>>> 21 -1  0  0  0
>>>>> 22  1  0  0  0
>>>>> 23 -1  0  0  0
>>>>> 24  1  0  0  0
>>>>>
>>>>> attr(,"assign")
>>>>> [1] 1 1 1 1
>>>>>
>>>>> I've tried this (Data_RP are my data, the M values of the expression set):
>>>>>
>>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL,
>>>>> design=design1)
>>>>>
>>>>> but it is returning an error message
>>>>> " Error in contr.sum(levels(batch)) :
>>>>>  not enough degrees of freedom to define contrasts"
>>>>>
>>>>> Please help me correct this code.
>>>>>
>>>>> Thank you so much for your help.
>>>>>
>>>>> Osee
>>>>>
>>>>> -- -- --
>>>>> Y. Osee Sanogo
>>>>> Integrative Biology
>>>>> Institute for Genomic Biology
>>>>> University of Illinois at Urbana
>>>>> 505 S. Goodwin Ave
>>>>> Urbana, IL-61801
>>>>>
>>>>> Tel: 217-333 2308 (Office)
>>>>>     217-417 9593 (Cell)
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of the
>> sender.
>> ______________________________________________________________________
>
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list