[BioC] Filtering before differential expression analysis of microarrays - New paper out (James W. MacDonald)

Jenny Drnevich drnevich at illinois.edu
Tue Jan 13 19:32:15 CET 2009


Hi Sherosha,

The description of genefilter() says:

genefilter filters genes in the array expr using the filter functions 
in flist. It returns an array of logical values (suitable for 
subscripting) of the same length as there are rows in expr. For each 
row of expr the returned value is TRUE if the row passed all the 
filter functions. Otherwise it is set to FALSE.

Your output object "selected" is just a vector of TRUEs and FALSEs, 
so I'm assuming you used it this way if you were going to filter 
BEFORE the statistical analysis:

 > selected=genefilter(all.esetsub[,61:102],ff)

 > fit=lmFit(all.esetsub[selected,61:102],design)

If you want use the same filters, which are selecting genes based on 
the normalized data, but not filter the genes out until after the 
analysis, you would do:

 > selected=genefilter(all.esetsub[,61:102],ff)  #same as above

 > fit=lmFit(all.esetsub[,61:102],design)

 > fit.filtered <- fit[selected,]

HTH,
Jenny


At 12:09 PM 1/13/2009, Sherosha Raj wrote:
>Hello Jenny
>
>This is how I setup the filters:
>
>#setup filters
> > f1=pOverA(0.25,log2(100))
> > f2=function(x)(IQR(x)>0.5)
> > ff=filterfun(f1,f2)
>
>Around here I sub-select the probesets that come through the filter
>from my expression set.
>  then proceed to
>
>#LIMMA
>
> >targets=readTargets("targets.txt",sep="")
> > WD=paste(targets$.....)
> > WD=factor(WD,levels=c("........."))
> > design=model.matrix(~0+WD)
> > colnames(design)=levels(WD)
> > fit=lmFit(all.esetsub[,61:102],design) #from a large eset 
> normalised over 102 chips so subsetting the relevant cel files
>
>#Contrast matrix
> > contmatrix=makeContrasts(.........,levels=design)
>
> >fit2=contrasts.fit(fit,contmatrix)
>
>If I were to filter here using the two filters above......
>
> >selected=genefilter(fit2,ff)
> > sum(selected)
>[1] 0
> > class(fit2q)
>[1] "MArrayLM"
>attr(,"package")
>[1] "limma"
>
>#When I filter before starting limma, I get 11504 probesets coming through.
>#I am confused how to proceed with the next steps....(i.e subset the
>fit2 object and apply the eBayes)..:-(
>
>#previously proceeded as follows after the "contrasts.fit" step:
> >fit2=eBayes(fit2)
> > changinggenes.05=decideTests(fit2,adjust.method="BH",p.value=0.05)
>
>etc etc
>
>
>I have been previously using filters before limma, but I 've been
>following the discussions on this board and would try to see how the
>data looks if I filtered prior o the eBayes step.
>
>
>Any help is greatly appreciated!!
>Thank you very much!
>Regards,
>Sherosha
>
>2009/1/13 Jenny Drnevich <drnevich at illinois.edu>:
> > Hi Sherosha,
> >
> > In general, you can filter by subsetting a MArrayLM object the exact same
> > way as you would an ExpressionSet object. If you have any trouble, please
> > post the code that you are trying to use.
> >
> > Cheers,
> > Jenny
> >
> > At 10:47 AM 1/13/2009, Sherosha Raj wrote:
> >>
> >> Hello all
> >>
> >> I"m sorry if this is a simple question, but how does one go about
> >> filtering after the eBayes step since the resulting object is of the
> >> class MArrayLM?
> >> I am used to filtering expression sets directly.
> >>
> >> Thank you very much!
> >> Sherosha
> >> >
> >> > ---------- Forwarded message ----------
> >> > From: "James W. MacDonald" <jmacdon at med.umich.edu>
> >> > To: Daniel Brewer <daniel.brewer at icr.ac.uk>
> >> > Date: Mon, 12 Jan 2009 09:25:02 -0500
> >> > Subject: Re: [BioC] Filtering before differential expression analysis of
> >> > microarrays - New paper out
> >> > Hi Dan,
> >> >
> >> > Daniel Brewer wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> There is a new paper out at BMC bioinformatics that seems to justify
> >> >> the
> >> >> use of filtering before differential expression analysis is performed
> >> >> (Hackstadt & Hess BMC Bioinformatics 2009, 10:11 -
> >> >> http://www.biomedcentral.com/1471-2105/10/11/abstract).  Specifically
> >> >> filtering by variance and detection call.  I have got the impression
> >> >> from this list that the general opinion is that one should only filter
> >> >> out the control genes before testing.  I was wondering if anyone had
> >> >> any
> >> >> opinions on this paper and the topic in general.
> >> >
> >> > I'm sure people do have opinions about this topic ;-D
> >> >
> >> > The reason people have so many opinions is because it isn't a simple
> >> > question, and it depends on what you consider important.
> >> >
> >> > If you are just trying to limit the number of multiple comparisons to
> >> > increase power, then filtering first is probably the way to go.
> >> >
> >> > If you are concerned with the accuracy of the FDR estimates, then
> >> > filtering first may not be ideal.
> >> >
> >> > If you are using limma (Hackstadt and Hess used multtest), then you
> >> > should filter after the eBayes step but before the FDR step, as an
> >> > assumption of the eBayes step is that all of the data from the chip are
> >> > available.
> >> >
> >> > Unless of course you are concerned about the accuracy of the FDR
> >> > estimates, in which case... well you see the point.
> >> >
> >> > With microarray data analysis the arguments for and against a particular
> >> > way of doing things can shed more heat than light, as nobody 
> really knows
> >> > the underlying truth, and the measures we use are really far 
> removed from
> >> > the actual phenomenon we are testing.
> >> >
> >> > Best,
> >> >
> >> > Jim
> >> >
> >> >
> >> >>
> >> >> Many thanks
> >> >>
> >> >> Dan
> >> >>
> >> >
> >> > --
> >> > James W. MacDonald, M.S.
> >> > Biostatistician
> >> > Hildebrandt Lab
> >> > 8220D MSRB III
> >> > 1150 W. Medical Center Drive
> >> > Ann Arbor MI 48109-5646
> >> > 734-936-8662
> >> >
> >> >
> >> >
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > Jenny Drnevich, Ph.D.
> >
> > Functional Genomics Bioinformatics Specialist
> > W.M. Keck Center for Comparative and Functional Genomics
> > Roy J. Carver Biotechnology Center
> > University of Illinois, Urbana-Champaign
> >
> > 330 ERML
> > 1201 W. Gregory Dr.
> > Urbana, IL 61801
> > USA
> >
> > ph: 217-244-7355
> > fax: 217-265-5066
> > e-mail: drnevich at illinois.edu
> >



More information about the Bioconductor mailing list