[BioC] RNA degradation tends & options for analysis

Thu Feb 23 10:31:16 CET 2006

Dear list,

I'm trying to analyse some Affy arrays for my PhD thesis but I'm a  
little bit stuck, so any comments on the following would be very  
welcome.

Basically I'm analysing a set of Affy arrays coming form 10 different  
labs (3 biological replicates per lab) where each lab is using a  
different RNA source. I've done some quality control using affyPLM  
and the chips seem to be ok.

If I have a look at the RNA digestion plot, 2 different trends are  
clearly visible (half of the arrays follow one trend with a slope  
around 1 and the other half with a slope around 3).

I want to make some contrasts between the different RNA sources that  
have been used, but as I've read in (Bolstad et al., 2005,  
Bioinformatics and Computational Biology Solutions Using R and  
Bioconductor, Springer) and in some previous messages in this list,  
mixing arrays with very different slopes in the RNA digestion plots  
is not a very good idea.

The options I'm thinking about at the moment are the following:

Option 1:
1.- Split the arrays by the lab of origin.
2.- Preprocess them separately using GCRMA.
3.- Combine the resulting esets into one eset.
4.- Analyse using limma, modeling for 3 factors (RNA type, lab  
effect, trend in the RNA digestion plot)
5.- Extract the contrasts I am interested in (the RNA type ones)

Option 2:
1.- Split the arrays by the trend of the RNA digestion plot.
2.- Preprocess them separately using GCRMA.
3.- Combine the resulting esets into one eset.
4.- Analyse using limma, modeling for 3 factors (RNA type, lab  
effect, trend in the RNA digestion plot)
5.- Extract the contrasts I am interested in (the RNA type ones)

Option 3:
1.- Do not split the arrays in groups.
2.- Preprocess all of them using GCRMA.
3.- Analyse using limma, modeling for 3 factors (RNA type, lab  
effect, trend in the RNA digestion plot)
4.- Extract the contrasts I am interested in (the RNA type ones)

Unfortunately I can't figure out which would be the best way to  
proceed, or even if modeling for the trend is something that would be  
acceptable. I've seen in the vignette of the affycoretools package  
that the arrays coming from different RNA protocols are preprocessed  
separately and then mixed for the linear model, although it is not  
clear for me why is this option better that any of the others.

On the other hand, some messages to the list last week were for  
preprocessing all the experiments at once...

My understanding is that there is not a clear consensus about what to  
do in those cases and I don't really know the consequences and the  
differences between following the different approaches, so any  
comments would be very much appreciated.

Thank you very much for your help.

Best wishes,

Juanma.

Juanma Vaquerizas
PhD Student
Regulation Group
EMBL-EBI
Wellcome Trust Genome Campus
Cambridge CB10 1SD
UK