[BioC] limma separate channel analysis of your data

Sanogo, Yibayiri O sanogo at illinois.edu
Wed Sep 12 06:22:24 CEST 2012


Dear Gordon,

Many, many thanks for your advice on how to properly analyze my data. I also want to thank you for the extra step you took to provide me with detailed step by step instructions on the analysis.

I will use the single channel analysis. I like the fact that I can account for all the confounding factors in the analysis.

I really appreciate your assistance and I will keep you informed about how it goes.

Have a great day!

Best,

Osee

________________________________________
From: Gordon K Smyth [smyth at wehi.EDU.AU]
Sent: Tuesday, September 11, 2012 8:13 PM
To: Sanogo, Yibayiri O
Cc: Osee Sanogo; Bioconductor mailing list
Subject: limma separate channel analysis of your data

Dear Osee,

Thank you for attaching the revised design of your experiment.  I found it
somewhat tricky to analyse, so my recommendations are more involved than I
usually give.  I think that you will need to perform a separate channel
analysis, otherwise it will not be possible to make baseline comparisons
between the brain parts.

I also think that you may need to re-normalize your data using the limma
package.  You say that the data has been "loess/scale normalized into an
expression set", but I don't think that scale normalization should be
necessary for Agilent data, and an expression set is not fully suitable
for two colour data.  The following code assumes that you have normalized
the data into an MAList object using the limma package.  I will call it
'MA'.  I strongly recommend that you use normexp background correction and
loess normalization as described in the limma User's Guide for GenePix
data.

The first step is to put your targets data into separate channel format:

   targets <- read.delim("Design_Brain.txt")
   Treat <- as.vector(t(targets[,c("Cy3","Cy5")]))
   Treat <- ifelse(Treat>0,"Exp","Ctrl")
   Treat <- factor(Treat)
   Brain <- rep(targets$Brain.Part,each=2,len=48)
   Fish <- factor(rep(targets$Fish.Number,each=2,len=48))
   Dye <- rep(0:1,len=48)
   data.frame(Fish,Brain,Treat,Dye)

Now you can fit a linear model, correcting for probe-specific dye-bias,
correcting for any differences between the fish, and computing Treatment
vs Control effects for each brain part:

   design <- model.matrix(~Dye+Fish+Brain+Brain:Treat)
   fit <- lmscFit(MA,design)
   fit <- eBayes(fit, trend=TRUE)

Now you can extract genes that have a treatment effect.  To find genes
that have a treatment vs control effect in telencephalon, you would use:

   topTable(fit, coef="BrainT:TreatExp")

For genes DE for treatment vs control in brain stem, it would be

   topTable(fit, coef="BrainBS:TreatExp")

And so on.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth


On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote:

Dear Gordon,

Attached please find the revised design of my experiment.

Thank you for your willingness to help me analyze these data.

Best regards,

Osee
________________________________________
From: Sanogo, Yibayiri O
Sent: Monday, September 10, 2012 9:51 PM
To: Gordon K Smyth; Osee Sanogo
Cc: Bioconductor mailing list
Subject: RE: Advice with RemoveBatchEffect and Rank Product package

Dear Gordon,

Please ignore the last two columns (weight and length)  of the design for
now:  I think they are screwed up due to some kind of sorting. I will have
to double check them (look in the notebook which is not here now). Only
different brain regions (T, D, C, BS) belonging to the same fish should
have same weight, length.

The other columns are:

Fish.Number= corresponds to fish ID, the individual fish (biological
replicates) Slide= the Agilent 4X44k slides on each the arrays were
located Brain.Part= the different brain regions that were dissected and
hybridized (within region). They were T (telencephalon), D (diencephalon),
C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control
and "1" is experimental.

FileName        Cy3     Cy5     Fish Number     Slide   Brain Part
1T.gpr  1       -1      1       2       T
2T.gpr  -1      1       2       1       T
3T.gpr  1       -1      3       4       T
4T.gpr  -1      1       4       3       T
5T.gpr  1       -1      5       6       T
6T.gpr  -1      1       6       5       T
1D.gpr  -1      1       1       5       D
2D.gpr  1       -1      2       4       D
3D.gpr  -1      1       3       1       D
4D.gpr  1       -1      4       6       D
5D.gpr  -1      1       5       3       D
6D.gpr  1       -1      6       2       D
1C.gpr  1       -1      1       4       C
2C.gpr  -1      1       2       3       C
3C.gpr  1       -1      3       6       C
4C.gpr  -1      1       4       5       C
5C.gpr  1       -1      5       2       C
6C.gpr  -1      1       6       1       C
1BS.gpr -1      1       1       1       BS
2BS.gpr 1       -1      2       2       BS
3BS.gpr -1      1       3       3       BS
4BS.gpr 1       -1      4       4       BS
6BS.gpr 1       -1      6       6       BS

Thank you for your willingness to help. I will stay up a bit so that I can
respond faster to your questions given the time difference between us.

Thank you so much.

Osee

________________________________________
From: Gordon K Smyth [smyth at wehi.EDU.AU]
Sent: Monday, September 10, 2012 6:34 PM
To: Osee Sanogo
Cc: Sanogo, Yibayiri O; Bioconductor mailing list
Subject: Re: Advice with RemoveBatchEffect and Rank Product package

Dear Osee,

Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight
and Length mean in your experiment?  If the first six arrays are from
biologically independent samples, why do they have the same weight and
length?

Best wishes
Gordon


On Mon, 10 Sep 2012, Osee Sanogo wrote:

> Dear Gordon,
>
> From what you said, it seems that I am oversimplifying my experiment by
> attempting to analyze it with RankProd, which doesn't offer the option for
> complex modeling.
>
> Could you please explain to me how I could analyze the experiment using
> Limma?
> Please let me know if you'd like me to provide further details of the
> experiment.
>
> Thank you so much.
>
> Osee
>
>
> On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at wehi.EDU.AU> wrote:
>
>> Dear Osee,
>>
>> No, you can't use removeBatchEffect to control for dye bias.
>>
>> Can you ignore the dye effect?  Not in general, but who knows?
>>
>> Your experiment seems too complex to be properly analysed using RankProd.
>> For one thing, it seems clear that you have obtained multiple parts of the
>> brain from the same biological replicates, meaning that your samples are
>> paired by fish number.
>>
>> I could explain how to analyse this experiment using limma.  However, if
>> you are determined that you will use RankProd, it might be best to email
>> the authors of that package for advice.
>>
>> Best wishes
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> http://www.statsci.org/smyth
>>
>> On Sun, 9 Sep 2012, Osee Sanogo wrote:
>>
>>> Dear Gordon,
>>>
>>> Thank you for getting back to me about my questions.
>>>
>>> My experiment is trying to identify differentially expressed genes in four
>>> regions of the brain in response to a stressor. I have 6 biol. replicates in
>>> each brain region for the control and experimental groups in each region,
>>> and the comparison is being done within brain region (i.e., T control vs T
>>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were
>>> run in two-color Agilent Array.
>>>
>>> You're right that the design I sent was from the separate channel analysis,
>>> in which I attempted to account for array and dye effect, and then run the
>>> data in RankProd. Now I know that this is not right. Ok, I will use the
>>> single channel analysis in Limma.
>>>
>>> I still would like to run the two-channel data (ratios) in RankProd, as my
>>> previous experience found this useful for my dara (low replicate numbers).
>>>
>>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias
>>> before running the two-channel data in RankProd? If yes, how should I do
>>> this using the RemoveBatch Effect function?
>>>            2) I found that about 3% of my probes have dye effect. Can I
>>> omit controlling for dye effect without compromising the result of my
>>> analysis?
>>>
>>> The data were loess/scale normalized into an expression set (Data_RP).
>>>
>>> Here is the design of the experiment
>>>
>>> FileName     Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length
>>> 1    1T.gpr   1  -1           1     2          T     39   0.63
>>> 2    2T.gpr  -1   1           2     1          T     39   0.63
>>> 3    3T.gpr   1  -1           3     4          T     39   0.63
>>> 4    4T.gpr  -1   1           4     3          T     39   0.63
>>> 5    5T.gpr   1  -1           5     6          T     39   0.63
>>> 6    6T.gpr  -1   1           6     5          T     NA     NA
>>> 7    1D.gpr  -1   1           1     5          D     47   1.21
>>> 8    2D.gpr   1  -1           2     4          D     47   1.21
>>> 9    3D.gpr  -1   1           3     1          D     47   1.21
>>> 10   4D.gpr   1  -1           4     6          D     47   1.21
>>> 11   5D.gpr  -1   1           5     3          D     47   1.21
>>> 12   6D.gpr   1  -1           6     2          D     NA     NA
>>> 13   1C.gpr   1  -1           1     4          C     47   1.31
>>> 14   2C.gpr  -1   1           2     3          C     47   1.31
>>> 15   3C.gpr   1  -1           3     6          C     47   1.31
>>> 16   4C.gpr  -1   1           4     5          C     47   1.31
>>> 17   5C.gpr   1  -1           5     2          C     47   1.31
>>> 18   6C.gpr  -1   1           6     1          C     NA     NA
>>> 19  1BS.gpr  -1   1           1     1         BS     89   1.44
>>> 20  2BS.gpr   1  -1           2     2         BS     89   1.44
>>> 21  3BS.gpr  -1   1           3     3         BS     89   1.44
>>> 22  4BS.gpr   1  -1           4     4         BS     NA     NA
>>> 23  5BS.gpr  -1   1           5     5         BS     NA     NA
>>> 24  6BS.gpr   1  -1           6     6         BS     NA     NA
>>>
>>> Thank you for your help and please let me know if you need further
>>> explanation of the experiment.
>>>
>>> Best regards,
>>>
>>> Osee
>>>
>>>>
>>>
>>>
>>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at wehi.EDU.AU> wrote:
>>>
>>>> Dear Osee,
>>>>
>>>> You are attempting to do a number of things that don't seem correct to me.
>>>>
>>>> First, you seem to attempting a separate channel analysis of two color
>>>> microarray data, but ignoring the pairing of the red and green channels.
>>>> It isn't correct to do this.  I don't see any way to use RankProd, or any
>>>> other package designed for independent samples, correctly in this context.
>>>> If you must do a separate channel analysis, you would be better off using
>>>> the separate channel analysis facilities of the limma package.
>>>>
>>>> Second, when you set batch=rep(1,24), you are defining a batch that
>>>> consists of every array in your data set.  Obviously it doesn't make sense
>>>> to remove batch effects unless there are at least two batches.
>>>>
>>>> Third, I don't follow your design matrix.
>>>>
>>>> It would be better to go back to the start, and describe in more basic
>>>> terms what is the nature of your data and what comparison you are trying
>>>> to make.
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000
>>>>> From: "Sanogo, Yibayiri O" <sanogo at illinois.edu>
>>>>> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
>>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package
>>>>>
>>>>> Dear Members of the list,
>>>>>
>>>>> (I apologize for posting this again -I sent it earlier to the list but
>>>>> from another account and I was listed me as non-Member -and Member I am
>>>>> since 2008:-)).
>>>>>
>>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I
>>>>> would like to remove the dye effect prior to analysis. I read on the forum
>>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of
>>>>> modeling that is not in Rank Product.
>>>>>
>>>>> I have two questions:
>>>>>
>>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye
>>>>> effect prior to running the expression data using Rank Prod?
>>>>>
>>>>> 2) a) If no, what other function could I use to do this?
>>>>>   b) If yes, I would like a help with the correct design and how to
>>>>> properly indicate the batch.
>>>>>
>>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are
>>>>> different areas of the brain):
>>>>>
>>>>> design1
>>>>>   BS  C  D  T
>>>>> 1   0  0  0  1
>>>>> 2   0  0  0 -1
>>>>> 3   0  0  0  1
>>>>> 4   0  0  0 -1
>>>>> 5   0  0  0  1
>>>>> 6   0  0  0 -1
>>>>> 7   0  0 -1  0
>>>>> 8   0  0  1  0
>>>>> 9   0  0 -1  0
>>>>> 10  0  0  1  0
>>>>> 11  0  0 -1  0
>>>>> 12  0  0  1  0
>>>>> 13  0  1  0  0
>>>>> 14  0 -1  0  0
>>>>> 15  0  1  0  0
>>>>> 16  0 -1  0  0
>>>>> 17  0  1  0  0
>>>>> 18  0 -1  0  0
>>>>> 19 -1  0  0  0
>>>>> 20  1  0  0  0
>>>>> 21 -1  0  0  0
>>>>> 22  1  0  0  0
>>>>> 23 -1  0  0  0
>>>>> 24  1  0  0  0
>>>>>
>>>>> attr(,"assign")
>>>>> [1] 1 1 1 1
>>>>>
>>>>> I've tried this (Data_RP are my data, the M values of the expression set):
>>>>>
>>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL,
>>>>> design=design1)
>>>>>
>>>>> but it is returning an error message
>>>>> " Error in contr.sum(levels(batch)) :
>>>>>  not enough degrees of freedom to define contrasts"
>>>>>
>>>>> Please help me correct this code.
>>>>>
>>>>> Thank you so much for your help.
>>>>>
>>>>> Osee
>>>>>
>>>>> -- -- --
>>>>> Y. Osee Sanogo
>>>>> Integrative Biology
>>>>> Institute for Genomic Biology
>>>>> University of Illinois at Urbana
>>>>> 505 S. Goodwin Ave
>>>>> Urbana, IL-61801
>>>>>
>>>>> Tel: 217-333 2308 (Office)
>>>>>     217-417 9593 (Cell)

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list