[BioC] PreFiltering probe in microarray analysis

wxu at msi.umn.edu wxu at msi.umn.edu
Mon Jun 13 16:57:52 CEST 2011


Hi Matt,

I, and all of us have done that way for a long time. Glad to see I am not
the only one who argues this approach. Agree, let's see how the debate
would change over time.

Thanks,
Wayne
--


> Wayne - I *definitely* mean cheating! It depends on whether the FDR is
> reported I suppose. Let's say you do a microarray screen and the 'most
> changed' gene that comes up (either by largest fold change or smallest
> t-test/ANOVA p-value) is 'interesting' biologically speaking. You go on to
> validate the change (on the same samples and further test sets) using qPCR
> and or western blots etc., if you go as far as protein analysis. Therefore
> you can analyse the importance of that single gene in a real biological
> context. No one could argue that the gene is not changed in the study and
> other samples, because of the low-throughput validation, and it makes a
> nice biological story for a paper. This is regardless of the arrays used,
> the test used, the FDR or actual p-value even. You could have picked the
> gene by sticking a pin in a list; you just used an array to make that pin
> stick more likely to give a real change.
>
> However, the statistical factors do definitely matter when you are trying
> to report an overall analysis with lots of
> genes/patterns/pathways/functions etc, with a wide range of conclusions,
> perhaps in the absence of being able to perform a high-throughput
> validation of every gene (or a proportion of) in the final 'significant'
> list. I can see it from both sides...however, sometimes it's easy to lose
> sight that an array hybridisation is just a hypothesis generator, not a
> hypothesis solver. That said any attempt to standardise this sort of
> reporting must have parity and (importantly) transparency with all these
> factors to have any success.
>
> I don't actually think there is a single valid answer to this issue, as
> there are so many interpretations/angles; it's just interesting to see how
> the debate changes over time. And essential to keep having it too!
>
> Thanks for reading - I have lots of thoughts about this!
> Matt
> ----------------------
> Matthew Arno, Ph.D.
> Genomics Centre Manager
> King's College London
>  
> The contents of this email are strictly confidential. It may not be
> transmitted in part or in whole to any other individual or groups of
> individuals.
> This email is intended solely for the use of the individual(s) to whom
> they are addressed and should not be released to any third party without
> the consent of the sender.
>
>
>
>>-----Original Message-----
>>From: wxu at msi.umn.edu [mailto:wxu at msi.umn.edu]
>>Sent: 13 June 2011 14:14
>>To: Arno, Matthew
>>Cc: bioconductor at r-project.org
>>Subject: Re: [BioC] PreFiltering probe in microarray analysis
>>
>>Thanks, Matt, for joining this discussion,
>>
>>It is true from Biologist point of view. You always get the top 10 genes
>>no matter filtering or not. But this shifts to another question, the
>>'amazingly good FDR'. For the same top ten gene, people can report
>>different FDRs by filtering or no filtering, or by filtering a different
>>number of genes. These FDRs in different reports are not comparable at
>>all. Does this FDR make sense? People can try to make it amazing good.
>>Does that sound a little 'cheating'? Sorry I do not mean a real cheating
>>here.
>>
>>Do you have any thought about this ?
>>
>>Best wishes,
>>
>>Wayne
>>--
>>
>>
>>
>>> Speaking as a pure 'biologist', I think it's OK to pre-filter genes as
>>> long you know the pitfalls, in terms of the potential bias and affect
>>on
>>> FDRs. I am personally aware of people pre-filtering not only to
>>enhance
>>> the FDR, but to use the results of a t-test as a starting point for a
>>> second sequential t-test because the FDRs from this test are
>>'amazingly
>>> good'.
>>>
>>> However statistically sacrilegious this is, the top 10 genes are
>>always
>>> going to be the same top 10 genes, so if you are just looking for the
>>top
>>> 10 genes, this is essentially OK.
>>>
>>> How does that hang with you guys?
>>>
>>> Matt
>>>
>>> ----------------------
>>> Matthew Arno, Ph.D.
>>> Genomics Centre Manager
>>> King's College London
>>>
>>> The contents of this email are strictly confidential. It may not be
>>> transmitted in part or in whole to any other individual or groups of
>>> individuals.
>>> This email is intended solely for the use of the individual(s) to whom
>>> they are addressed and should not be released to any third party
>>without
>>> the consent of the sender.
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: bioconductor-bounces at r-project.org [mailto:bioconductor-
>>bounces at r-
>>>>project.org] On Behalf Of wxu at msi.umn.edu
>>>>Sent: 12 June 2011 16:41
>>>>To: Wolfgang Huber
>>>>Cc: bioconductor at r-project.org
>>>>Subject: Re: [BioC] PreFiltering probe in microarray analysis
>>>>
>>>>Hi, Dear Wolfgang,
>>>>
>>>>I think it would nice to bring up a discussion here about the gene
>>>>prefiltering issue. Please point me out if this suggestion is
>>>>inappropriate.
>>>>
>>>>There are two questions in the gene filtering which I could not find
>>>>answers:
>>>>1). In the traditional multiple tests to correct the p-values of many
>>>>test
>>>>groups for example, in a new drug effect experiment, is it appropriate
>>>>to
>>>>remove some group tests from the whole experiment? If not, why can we
>>>>prefilter the genes?
>>>>2). As I stated in the previous email, we assume that the raw pvalues
>>>>and
>>>>the top lowest-pvalue genes are the same before (35k genes) and after
>>>>gene
>>>>filtering (5k genes), the gene x you selected from 35K versus the one
>>>>selected from 5K, which is more sound? In other words, the best
>>student
>>>>selected from 1000 students versus the best student selected from 100,
>>>>which is more sound?
>>>>
>>>>So this is a question of the whole point of gene prefiltering
>>approach.
>>>>
>>>>Best wishes,
>>>>
>>>>Wayne
>>>>--
>>>>> Hi Swapna
>>>>>
>>>>> Il Jun/2/11 7:58 PM, Swapna Menon ha scritto:
>>>>>> Hi Stephanie,
>>>>>> There is another recent paper that you might consider which also
>>>>>> cautions about filtering
>>>>>> Van Iterson, M., Boer, J. M.,&  Menezes, R. X. (2010). Filtering,
>>FDR
>>>>>> and power. BMC Bioinformatics, 11(1), 450.
>>>>>> They also recommend their own statistical test to see if one's
>>filter
>>>>>> biases FDR.
>>>>>> currently I am trying variance filter and feature filter from
>>>>>> genefilter package: try ?nsFilter for help on these functions.
>>>>>> However, I dont use filtering routinely since choosing the right
>>>>>> filter , parameters and testing the effects of any bias are things
>>I
>>>>>> have not worked out in addition to having read Bourgon et al and
>>>>>> Iterson et al and others that discuss this issue.
>>>>>> About your limma results, while conventional filtering may be
>>>>expected
>>>>>> to increase the number of significant genes, as the papers suggest
>>>>>> likelihood of false positives also increases.
>>>>>
>>>>> No. Properly applied filtering does not affect the false positive
>>>>rates
>>>>> (FWER or FDR). That's the whole point of it. [1]
>>>>>
>>>>> If one is willing to put up with higher rate or probability of false
>>>>> discoveries, then don't do filtering - just increase the p-value
>>>>cutoff.
>>>>>
>>>>> [1] Bourgon et al., PNAS 2010.
>>>>>
>>>>>> In your current results,
>>>>>> do you have high fold changes above 2 (log2>1)?  You may want to
>>>>>> explore the biological relevance of those genes with high FC and
>>>>>> significant unadjusted p value.
>>>>>> Best,
>>>>>> Swapna
>>>>>
>>>>> Best wishes
>>>>> Wolfgang Huber
>>>>> EMBL
>>>>> http://www.embl.de/research/units/genome_biology/huber
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>>_______________________________________________
>>>>Bioconductor mailing list
>>>>Bioconductor at r-project.org
>>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>Search the archives:
>>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>



More information about the Bioconductor mailing list