[BioC] PreFiltering probe in microarray analysis
Moshe Olshansky
olshansky at wehi.EDU.AU
Sun Jun 19 14:55:45 CEST 2011
Hi Kevin,
Thank you for your explanation.
Moshe.
P.S. How about RT qPCR where one has a plate with 384 wells - a step
towards high throughput PCR - is it as accurate as traditional PCR?
> Not necessarily. PCR has wider dynamic range and greater precision than
> microarrays. The improved precision _may_ mean that we have
> substantially more evidence for differential expression based on the PCR
> results (even on the same samples) than we did just from the analysis of
> the microarray data.
>
> I do, however, agree that additional independent samples are the best
> solution (and am amazed to have found myself writing the first paragraph
> of this response...)
>
> Kevin
>
> On 6/16/2011 10:18 PM, Moshe Olshansky wrote:
>> Hi Matt,
>>
>> Let me note that PCR (or even protein analysis) performed on SAME
>> samples
>> does not solve the FDR problem. It will only confirm that microarrays
>> reported correct expression levels (or fold change). So now we are sure
>> that in 3 samples under condition A the level of some gene is indeed
>> higher than in 3 samples under condition B, but we still do not know
>> whether this is a true phenomenon distinguishing conditions A and B or
>> this just happened by chance since we have thousands (or tens of
>> thousands) of genes.
>> You will need additional (independent) samples to confirm that this is a
>> true phenomenon.
>>
>> Moshe.
>>
>>> Dear Matt,
>>>
>>> I read your email again. Since you have lots of thoughts about this
>>> issue, I guess you probably have also thought a lot about the
>>> solutions.
>>> Hope my continuing followup is not boring. Please point out if I am
>>> wrong in my words.
>>>
>>> There is no question (actually less questions) about the experiment
>>> result such as RT-PCR result of the differentially expressed gene
>>> detection.
>>>
>>> However, when we test many genes in microarray or RNAseq, we do need
>>> something like FDR to control how many genes we are going to report.
>>> Eeven thought this FDR is not "absolutely true false discovery rate",
>>> it
>>> can work as a relative controller. The point is when different people
>>> use the same FDR method the FDR reports should be comparable.
>>>
>>> Usually people will not do gene prefiltering first, and do it only when
>>> they find the FDR is too high. If you report a gene list with very high
>>> FDR, the reviewers will reject the paper. Therefore people try to make
>>> an amazing good FDR by gene prefiltering. The same gene list that had a
>>> high FDR before the gene prefiltering now has a lower FDR. Then the
>>> reviewers would be happy with the good FDR.
>>>
>>> It seems, in some cases," with this FDR method, we have to do gene
>>> prefiltering in order to get a good FDR". We can see here that there
>>> are
>>> two problems. One is the FDR method itself, and the other is the gene
>>> prefiltering approach.
>>>
>>> Having thought a lot about these problems, I came out a solution called
>>> EDR in which I have addressed these problems:
>>> http://www.ncbi.nlm.nih.gov/pubmed/20846437
>>>
>>> Have you read this paper? Do you think that could be one of the
>>> standardized solutions? or any comments would be appreciated,
>>>
>>> Best wishes,
>>>
>>> Wayne
>>>
>>> --
>>> -----------------------------------------------------------------------
>>> Wayne Xu, Ph.D
>>> Computational Genomics Specialist
>>>
>>> Supercomputing Institute for Advanced Computational Research
>>> 550 Walter Library
>>> 117 Pleasant Street SE
>>> University of Minnesota
>>> Minneapolis, Minnesota 55455
>>> email: wxu at msi.umn.edu help email: help at msi.umn.edu
>>> phone: 612-624-1447 help phone: 612-626-0802
>>> fax: 612-624-8861
>>> -----------------------------------------------------------------------
>>>
>>>
>>>
>>> --On 6/13/2011 9:01 AM, Arno, Matthew wrote:
>>>> Wayne - I *definitely* mean cheating! It depends on whether the FDR is
>>>> reported I suppose. Let's say you do a microarray screen and the 'most
>>>> changed' gene that comes up (either by largest fold change or smallest
>>>> t-test/ANOVA p-value) is 'interesting' biologically speaking. You go
>>>> on
>>>> to validate the change (on the same samples and further test sets)
>>>> using
>>>> qPCR and or western blots etc., if you go as far as protein analysis.
>>>> Therefore you can analyse the importance of that single gene in a real
>>>> biological context. No one could argue that the gene is not changed in
>>>> the study and other samples, because of the low-throughput validation,
>>>> and it makes a nice biological story for a paper. This is regardless
>>>> of
>>>> the arrays used, the test used, the FDR or actual p-value even. You
>>>> could have picked the gene by sticking a pin in a list; you just used
>>>> an
>>>> array to make that pin stick more likely to give a real change.
>>>>
>>>> However, the statistical factors do definitely matter when you are
>>>> trying to report an overall analysis with lots of
>>>> genes/patterns/pathways/functions etc, with a wide range of
>>>> conclusions,
>>>> perhaps in the absence of being able to perform a high-throughput
>>>> validation of every gene (or a proportion of) in the final
>>>> 'significant'
>>>> list. I can see it from both sides...however, sometimes it's easy to
>>>> lose sight that an array hybridisation is just a hypothesis generator,
>>>> not a hypothesis solver. That said any attempt to standardise this
>>>> sort
>>>> of reporting must have parity and (importantly) transparency with all
>>>> these factors to have any success.
>>>>
>>>> I don't actually think there is a single valid answer to this issue,
>>>> as
>>>> there are so many interpretations/angles; it's just interesting to see
>>>> how the debate changes over time. And essential to keep having it too!
>>>>
>>>> Thanks for reading - I have lots of thoughts about this!
>>>> Matt
>>>> ----------------------
>>>> Matthew Arno, Ph.D.
>>>> Genomics Centre Manager
>>>> King's College London
>>>>
>>>> The contents of this email are strictly confidential. It may not be
>>>> transmitted in part or in whole to any other individual or groups of
>>>> individuals.
>>>> This email is intended solely for the use of the individual(s) to whom
>>>> they are addressed and should not be released to any third party
>>>> without
>>>> the consent of the sender.
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: wxu at msi.umn.edu [mailto:wxu at msi.umn.edu]
>>>>> Sent: 13 June 2011 14:14
>>>>> To: Arno, Matthew
>>>>> Cc: bioconductor at r-project.org
>>>>> Subject: Re: [BioC] PreFiltering probe in microarray analysis
>>>>>
>>>>> Thanks, Matt, for joining this discussion,
>>>>>
>>>>> It is true from Biologist point of view. You always get the top 10
>>>>> genes
>>>>> no matter filtering or not. But this shifts to another question, the
>>>>> 'amazingly good FDR'. For the same top ten gene, people can report
>>>>> different FDRs by filtering or no filtering, or by filtering a
>>>>> different
>>>>> number of genes. These FDRs in different reports are not comparable
>>>>> at
>>>>> all. Does this FDR make sense? People can try to make it amazing
>>>>> good.
>>>>> Does that sound a little 'cheating'? Sorry I do not mean a real
>>>>> cheating
>>>>> here.
>>>>>
>>>>> Do you have any thought about this ?
>>>>>
>>>>> Best wishes,
>>>>>
>>>>> Wayne
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>> Speaking as a pure 'biologist', I think it's OK to pre-filter genes
>>>>>> as
>>>>>> long you know the pitfalls, in terms of the potential bias and
>>>>>> affect
>>>>> on
>>>>>> FDRs. I am personally aware of people pre-filtering not only to
>>>>> enhance
>>>>>> the FDR, but to use the results of a t-test as a starting point for
>>>>>> a
>>>>>> second sequential t-test because the FDRs from this test are
>>>>> 'amazingly
>>>>>> good'.
>>>>>>
>>>>>> However statistically sacrilegious this is, the top 10 genes are
>>>>> always
>>>>>> going to be the same top 10 genes, so if you are just looking for
>>>>>> the
>>>>> top
>>>>>> 10 genes, this is essentially OK.
>>>>>>
>>>>>> How does that hang with you guys?
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> ----------------------
>>>>>> Matthew Arno, Ph.D.
>>>>>> Genomics Centre Manager
>>>>>> King's College London
>>>>>>
>>>>>> The contents of this email are strictly confidential. It may not be
>>>>>> transmitted in part or in whole to any other individual or groups of
>>>>>> individuals.
>>>>>> This email is intended solely for the use of the individual(s) to
>>>>>> whom
>>>>>> they are addressed and should not be released to any third party
>>>>> without
>>>>>> the consent of the sender.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioconductor-bounces at r-project.org [mailto:bioconductor-
>>>>> bounces at r-
>>>>>>> project.org] On Behalf Of wxu at msi.umn.edu
>>>>>>> Sent: 12 June 2011 16:41
>>>>>>> To: Wolfgang Huber
>>>>>>> Cc: bioconductor at r-project.org
>>>>>>> Subject: Re: [BioC] PreFiltering probe in microarray analysis
>>>>>>>
>>>>>>> Hi, Dear Wolfgang,
>>>>>>>
>>>>>>> I think it would nice to bring up a discussion here about the gene
>>>>>>> prefiltering issue. Please point me out if this suggestion is
>>>>>>> inappropriate.
>>>>>>>
>>>>>>> There are two questions in the gene filtering which I could not
>>>>>>> find
>>>>>>> answers:
>>>>>>> 1). In the traditional multiple tests to correct the p-values of
>>>>>>> many
>>>>>>> test
>>>>>>> groups for example, in a new drug effect experiment, is it
>>>>>>> appropriate
>>>>>>> to
>>>>>>> remove some group tests from the whole experiment? If not, why can
>>>>>>> we
>>>>>>> prefilter the genes?
>>>>>>> 2). As I stated in the previous email, we assume that the raw
>>>>>>> pvalues
>>>>>>> and
>>>>>>> the top lowest-pvalue genes are the same before (35k genes) and
>>>>>>> after
>>>>>>> gene
>>>>>>> filtering (5k genes), the gene x you selected from 35K versus the
>>>>>>> one
>>>>>>> selected from 5K, which is more sound? In other words, the best
>>>>> student
>>>>>>> selected from 1000 students versus the best student selected from
>>>>>>> 100,
>>>>>>> which is more sound?
>>>>>>>
>>>>>>> So this is a question of the whole point of gene prefiltering
>>>>> approach.
>>>>>>> Best wishes,
>>>>>>>
>>>>>>> Wayne
>>>>>>> --
>>>>>>>> Hi Swapna
>>>>>>>>
>>>>>>>> Il Jun/2/11 7:58 PM, Swapna Menon ha scritto:
>>>>>>>>> Hi Stephanie,
>>>>>>>>> There is another recent paper that you might consider which also
>>>>>>>>> cautions about filtering
>>>>>>>>> Van Iterson, M., Boer, J. M.,& Menezes, R. X. (2010).
>>>>>>>>> Filtering,
>>>>> FDR
>>>>>>>>> and power. BMC Bioinformatics, 11(1), 450.
>>>>>>>>> They also recommend their own statistical test to see if one's
>>>>> filter
>>>>>>>>> biases FDR.
>>>>>>>>> currently I am trying variance filter and feature filter from
>>>>>>>>> genefilter package: try ?nsFilter for help on these functions.
>>>>>>>>> However, I dont use filtering routinely since choosing the right
>>>>>>>>> filter , parameters and testing the effects of any bias are
>>>>>>>>> things
>>>>> I
>>>>>>>>> have not worked out in addition to having read Bourgon et al and
>>>>>>>>> Iterson et al and others that discuss this issue.
>>>>>>>>> About your limma results, while conventional filtering may be
>>>>>>> expected
>>>>>>>>> to increase the number of significant genes, as the papers
>>>>>>>>> suggest
>>>>>>>>> likelihood of false positives also increases.
>>>>>>>> No. Properly applied filtering does not affect the false positive
>>>>>>> rates
>>>>>>>> (FWER or FDR). That's the whole point of it. [1]
>>>>>>>>
>>>>>>>> If one is willing to put up with higher rate or probability of
>>>>>>>> false
>>>>>>>> discoveries, then don't do filtering - just increase the p-value
>>>>>>> cutoff.
>>>>>>>> [1] Bourgon et al., PNAS 2010.
>>>>>>>>
>>>>>>>>> In your current results,
>>>>>>>>> do you have high fold changes above 2 (log2>1)? You may want to
>>>>>>>>> explore the biological relevance of those genes with high FC and
>>>>>>>>> significant unadjusted p value.
>>>>>>>>> Best,
>>>>>>>>> Swapna
>>>>>>>> Best wishes
>>>>>>>> Wolfgang Huber
>>>>>>>> EMBL
>>>>>>>> http://www.embl.de/research/units/genome_biology/huber
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list