[BioC] RNA degradation tends & options for analysis

Thu Feb 23 15:16:28 CET 2006

Juanma Vaquerizas wrote:
> Dear list,
> 
> I'm trying to analyse some Affy arrays for my PhD thesis but I'm a  
> little bit stuck, so any comments on the following would be very  
> welcome.
> 
> Basically I'm analysing a set of Affy arrays coming form 10 different  
> labs (3 biological replicates per lab) where each lab is using a  
> different RNA source. I've done some quality control using affyPLM  
> and the chips seem to be ok.

Is this after processing them as one batch? If the residuals look OK, 
then this is a good indication that you can process them all together.

> 
> If I have a look at the RNA digestion plot, 2 different trends are  
> clearly visible (half of the arrays follow one trend with a slope  
> around 1 and the other half with a slope around 3).
> 
> I want to make some contrasts between the different RNA sources that  
> have been used, but as I've read in (Bolstad et al., 2005,  
> Bioinformatics and Computational Biology Solutions Using R and  
> Bioconductor, Springer) and in some previous messages in this list,  
> mixing arrays with very different slopes in the RNA digestion plots  
> is not a very good idea.

In my experience, the RNA degradation plots are not nearly as important 
as the density plots. What do they look like? Are the distributions all 
pretty similar in shape and fairly close together?

> 
> The options I'm thinking about at the moment are the following:
> 
> Option 1:
> 1.- Split the arrays by the lab of origin.
> 2.- Preprocess them separately using GCRMA.
> 3.- Combine the resulting esets into one eset.
> 4.- Analyse using limma, modeling for 3 factors (RNA type, lab  
> effect, trend in the RNA digestion plot)
> 5.- Extract the contrasts I am interested in (the RNA type ones)
> 
> Option 2:
> 1.- Split the arrays by the trend of the RNA digestion plot.
> 2.- Preprocess them separately using GCRMA.
> 3.- Combine the resulting esets into one eset.
> 4.- Analyse using limma, modeling for 3 factors (RNA type, lab  
> effect, trend in the RNA digestion plot)
> 5.- Extract the contrasts I am interested in (the RNA type ones)
> 
> Option 3:
> 1.- Do not split the arrays in groups.
> 2.- Preprocess all of them using GCRMA.
> 3.- Analyse using limma, modeling for 3 factors (RNA type, lab  
> effect, trend in the RNA digestion plot)
> 4.- Extract the contrasts I am interested in (the RNA type ones)

I would think this is the most reasonable method, if as you say the 
residuals from affyPLM all look good. One further check you can make is 
to do a PCA plot of the first two PCs and see how the replicated samples 
are grouping. If the replicates are all grouping together it may not 
even be necessary to model the lab effect. You could use plotPCA() in 
affycoretools to do this step.

> 
> 
> Unfortunately I can't figure out which would be the best way to  
> proceed, or even if modeling for the trend is something that would be  
> acceptable. I've seen in the vignette of the affycoretools package  
> that the arrays coming from different RNA protocols are preprocessed  
> separately and then mixed for the linear model, although it is not  
> clear for me why is this option better that any of the others.

Well, the example in affycoretools is a very special case and should not 
be construed as an example that one should use for 'normal' analyses 
(which makes me wonder if I need a different example).

Anyway, in that vignette the samples have been processed completely 
differently (one set amplified with the NuGen Ovation kit, and one using 
the normal Affy IVT kit), so there is no way they should be processed as 
one batch. I then stick both sets of expression values into one exprSet 
simply to make the linear modeling step easier. Since I use a cell means 
model and never make any contrasts between the groups, this analysis is 
equivalent to keeping the data separate and fitting two separate models.

HTH,

Jim

> 
> On the other hand, some messages to the list last week were for  
> preprocessing all the experiments at once...
> 
> My understanding is that there is not a clear consensus about what to  
> do in those cases and I don't really know the consequences and the  
> differences between following the different approaches, so any  
> comments would be very much appreciated.
> 
> Thank you very much for your help.
> 
> Best wishes,
> 
> Juanma.
> 
> 
> 
> Juanma Vaquerizas
> PhD Student
> Regulation Group
> EMBL-EBI
> Wellcome Trust Genome Campus
> Cambridge CB10 1SD
> UK
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623