[BioC] limma: Combining probe level data prior to fitting

Wed Oct 15 11:06:16 CEST 2008

Hi,

I'm trying to do an analysis of some tag3 Affy microarray data with  
the limma package, and I've run into a problem I'm not sure how to  
solve.

The data comes from a series of arrays recorded at different time  
points. Each probe on the array represents a DNA tag from a yeast gene  
deletion library. There are several different time points and  
technical replicates for some of them. All I need to do is a linear  
regression of the expression levels against time and find the slope of  
that line.

So far so easy, but the problem I've run into is that each yeast gene  
deletion mutant is represented on the array by multiple probes  
(sometimes 2, sometimes more). What I'd like to do is fit all the data  
for each deletion mutant simultaneously rather than on a probe-by- 
probe basis. Hopefully this will improve the quality of the linear  
regression.

So, is there any trick in limma (or in the preprocessing step prior to  
limma) for combining probe level data (i.e. rows of the expression  
matrix)? I could just average the data across probe sets for each  
deleton, but that seems like it wouldn't be as powerful as fitting all  
the data (I think?).

Alternatively, am I better off just taking the expression values and  
doing the linear regression using the standard R lm function? In that  
case, could anyone point me towards a method for accounting for the  
technical replicates (which limma knows how to handle and does the  
'right thing'). I've tried reading gls.series, but it's a bit scary  
for a biologist.

Alex Gutteridge
Systems Biology Centre
University of Cambridge