Within array replicates in Limma, was: [BioC] Data analysis

Gordon Smyth smyth at wehi.edu.au
Tue Oct 28 13:54:11 MET 2003

At 09:00 PM 28/10/2003, Luke Whitaker wrote:
>Hello all,
>I have a number of Agilent based experiments where I have been asked to
>find up and down regulated genes, and later on to do some sort of
>clustering of gene profiles across multiple experiments. Currently I am
>only concerned in looking for the most highly regulated genes within a
>single (multi-array) experiment.
>After spending a while knocking my data into shape for analysis by
>"limma" (or so I thought) I calculated the top 30 regulated genes
>for a couple of experiments and noticed the same gene appearing
>more than once in the top 30 list. Then I read this post...
>On Fri, 17 Oct 2003, Gordon Smyth wrote:
> > At 11:53 PM 16/10/2003, Jason Skelton wrote:
> > >On a different note
> > >The arrays I have tested LIMMA on have 2 duplicates and are spaced evenly
> > >throughout the array and so have no problems running your functions.
> > >
> > >Someone else at the Sanger Insitite would like to be able to use LIMMA 
> but
> > >the number of duplicates for each gene differs on their array e.g for 
> some
> > >genes their are two copies and for others there would be four copies or
> > >more which inturn obviously effects spacing etc between replicates.
> > >I'm not sure why they would want differing numbers of copies of genes but
> > >they would like to be able to estimate the correlation between these 
> genes
> > >anyway and obviously see the results as one data point per merged gene.
> >
> > I haven't implemented this in limma because it seems to me that it might
> > invalidate the assumptions behind the duplicate correlation approach. See
> > the earlier post:
> >
> > https://stat.ethz.ch/pipermail/bioconductor/2003-August/002224.html
> >
> > >I've tried to think of how this can be done but it seems overly complex
> > >and I'm not sure if it is at all possible in R or Limma.
> > >
> > >I'm guessing there is no way of carryout the correlation, series model
> > >fits etc based simply on the "Name" specified in the GAL files ?
> >
> > No.
> >
> > Cheers
> > Gordon
>Obviously I hadn't read the documentation carefully enough, because none of
>the arrays I have been asked to analyse have evenly spaced duplicates.
>After a bit of wailing, gnashing of teeth, and banging my head against the
>desk, I was wondering if there is a rational way of combining the multiple
>estimates for a single gene ? In particular, could Bayes rule be used to
>combine multiple P value estimates for a single gene ?


>  What about the M
>estimates - could a simple arithmetic or geometric mean be used ?

Yes - simple arithmetic mean is fine.


>Note that these estimates do NOT have to be theoretically perfect - almost
>any rough and ready method that has some sort of validity will do. After
>all, the basic experimental assumptions are fairly approximate, and some
>sort of approximate estimate will be much better than no estimate at all.
>Or is Limma the wrong package for my analysis, in which case, what package
>should I be using ? I asked this here before, but didn't specify that I had
>irregularly spaced duplicates as I hadn't realised that was an issue.

I do not know any package which handles irregularly spaced duplicates or 
for that matter any research material which would provide reliable 
methodology to do so. (Apart from limma, I'm don't know any package which 
automatically handles duplicates at all although that doesn't mean such 
doesn't exist.)


>  Are
>there any other likely "gotchas" in terms of assumptions that packages will
>make about data layouts ?

>Luke Whitaker

More information about the Bioconductor mailing list