[BioC] Filtering before differential expression analysis of microarrays - New paper out (James W. MacDonald)

Sherosha Raj sherosha at gmail.com
Tue Jan 13 20:01:01 CET 2009


Hi Jenny
 Thank you very much for your help. It works now. These analyses are
so interesting I'm still learning :-).

Regards,
Sherosha

2009/1/13 Jenny Drnevich <drnevich at illinois.edu>:
> Hi Sherosha,
>
> The description of genefilter() says:
>
> genefilter filters genes in the array expr using the filter functions in
> flist. It returns an array of logical values (suitable for subscripting) of
> the same length as there are rows in expr. For each row of expr the returned
> value is TRUE if the row passed all the filter functions. Otherwise it is
> set to FALSE.
>
> Your output object "selected" is just a vector of TRUEs and FALSEs, so I'm
> assuming you used it this way if you were going to filter BEFORE the
> statistical analysis:
>
>> selected=genefilter(all.esetsub[,61:102],ff)
>
>> fit=lmFit(all.esetsub[selected,61:102],design)
>
> If you want use the same filters, which are selecting genes based on the
> normalized data, but not filter the genes out until after the analysis, you
> would do:
>
>> selected=genefilter(all.esetsub[,61:102],ff)  #same as above
>
>> fit=lmFit(all.esetsub[,61:102],design)
>
>> fit.filtered <- fit[selected,]
>
> HTH,
> Jenny
>
>
> At 12:09 PM 1/13/2009, Sherosha Raj wrote:
>>
>> Hello Jenny
>>
>> This is how I setup the filters:
>>
>> #setup filters
>> > f1=pOverA(0.25,log2(100))
>> > f2=function(x)(IQR(x)>0.5)
>> > ff=filterfun(f1,f2)
>>
>> Around here I sub-select the probesets that come through the filter
>> from my expression set.
>>  then proceed to
>>
>> #LIMMA
>>
>> >targets=readTargets("targets.txt",sep="")
>> > WD=paste(targets$.....)
>> > WD=factor(WD,levels=c("........."))
>> > design=model.matrix(~0+WD)
>> > colnames(design)=levels(WD)
>> > fit=lmFit(all.esetsub[,61:102],design) #from a large eset normalised
>> > over 102 chips so subsetting the relevant cel files
>>
>> #Contrast matrix
>> > contmatrix=makeContrasts(.........,levels=design)
>>
>> >fit2=contrasts.fit(fit,contmatrix)
>>
>> If I were to filter here using the two filters above......
>>
>> >selected=genefilter(fit2,ff)
>> > sum(selected)
>> [1] 0
>> > class(fit2q)
>> [1] "MArrayLM"
>> attr(,"package")
>> [1] "limma"
>>
>> #When I filter before starting limma, I get 11504 probesets coming
>> through.
>> #I am confused how to proceed with the next steps....(i.e subset the
>> fit2 object and apply the eBayes)..:-(
>>
>> #previously proceeded as follows after the "contrasts.fit" step:
>> >fit2=eBayes(fit2)
>> > changinggenes.05=decideTests(fit2,adjust.method="BH",p.value=0.05)
>>
>> etc etc
>>
>>
>> I have been previously using filters before limma, but I 've been
>> following the discussions on this board and would try to see how the
>> data looks if I filtered prior o the eBayes step.
>>
>>
>> Any help is greatly appreciated!!
>> Thank you very much!
>> Regards,
>> Sherosha
>>
>> 2009/1/13 Jenny Drnevich <drnevich at illinois.edu>:
>> > Hi Sherosha,
>> >
>> > In general, you can filter by subsetting a MArrayLM object the exact
>> > same
>> > way as you would an ExpressionSet object. If you have any trouble,
>> > please
>> > post the code that you are trying to use.
>> >
>> > Cheers,
>> > Jenny
>> >
>> > At 10:47 AM 1/13/2009, Sherosha Raj wrote:
>> >>
>> >> Hello all
>> >>
>> >> I"m sorry if this is a simple question, but how does one go about
>> >> filtering after the eBayes step since the resulting object is of the
>> >> class MArrayLM?
>> >> I am used to filtering expression sets directly.
>> >>
>> >> Thank you very much!
>> >> Sherosha
>> >> >
>> >> > ---------- Forwarded message ----------
>> >> > From: "James W. MacDonald" <jmacdon at med.umich.edu>
>> >> > To: Daniel Brewer <daniel.brewer at icr.ac.uk>
>> >> > Date: Mon, 12 Jan 2009 09:25:02 -0500
>> >> > Subject: Re: [BioC] Filtering before differential expression analysis
>> >> > of
>> >> > microarrays - New paper out
>> >> > Hi Dan,
>> >> >
>> >> > Daniel Brewer wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> There is a new paper out at BMC bioinformatics that seems to justify
>> >> >> the
>> >> >> use of filtering before differential expression analysis is
>> >> >> performed
>> >> >> (Hackstadt & Hess BMC Bioinformatics 2009, 10:11 -
>> >> >> http://www.biomedcentral.com/1471-2105/10/11/abstract).
>> >> >>  Specifically
>> >> >> filtering by variance and detection call.  I have got the impression
>> >> >> from this list that the general opinion is that one should only
>> >> >> filter
>> >> >> out the control genes before testing.  I was wondering if anyone had
>> >> >> any
>> >> >> opinions on this paper and the topic in general.
>> >> >
>> >> > I'm sure people do have opinions about this topic ;-D
>> >> >
>> >> > The reason people have so many opinions is because it isn't a simple
>> >> > question, and it depends on what you consider important.
>> >> >
>> >> > If you are just trying to limit the number of multiple comparisons to
>> >> > increase power, then filtering first is probably the way to go.
>> >> >
>> >> > If you are concerned with the accuracy of the FDR estimates, then
>> >> > filtering first may not be ideal.
>> >> >
>> >> > If you are using limma (Hackstadt and Hess used multtest), then you
>> >> > should filter after the eBayes step but before the FDR step, as an
>> >> > assumption of the eBayes step is that all of the data from the chip
>> >> > are
>> >> > available.
>> >> >
>> >> > Unless of course you are concerned about the accuracy of the FDR
>> >> > estimates, in which case... well you see the point.
>> >> >
>> >> > With microarray data analysis the arguments for and against a
>> >> > particular
>> >> > way of doing things can shed more heat than light, as nobody really
>> >> > knows
>> >> > the underlying truth, and the measures we use are really far removed
>> >> > from
>> >> > the actual phenomenon we are testing.
>> >> >
>> >> > Best,
>> >> >
>> >> > Jim
>> >> >
>> >> >
>> >> >>
>> >> >> Many thanks
>> >> >>
>> >> >> Dan
>> >> >>
>> >> >
>> >> > --
>> >> > James W. MacDonald, M.S.
>> >> > Biostatistician
>> >> > Hildebrandt Lab
>> >> > 8220D MSRB III
>> >> > 1150 W. Medical Center Drive
>> >> > Ann Arbor MI 48109-5646
>> >> > 734-936-8662
>> >> >
>> >> >
>> >> >
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at stat.math.ethz.ch
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> > Jenny Drnevich, Ph.D.
>> >
>> > Functional Genomics Bioinformatics Specialist
>> > W.M. Keck Center for Comparative and Functional Genomics
>> > Roy J. Carver Biotechnology Center
>> > University of Illinois, Urbana-Champaign
>> >
>> > 330 ERML
>> > 1201 W. Gregory Dr.
>> > Urbana, IL 61801
>> > USA
>> >
>> > ph: 217-244-7355
>> > fax: 217-265-5066
>> > e-mail: drnevich at illinois.edu
>> >
>
>



-- 
Regards,
Sherosha



More information about the Bioconductor mailing list