[BioC] advice on absent present filtering needed
Kimpel, Mark William
mkimpel at iupui.edu
Thu Oct 26 17:17:27 CEST 2006
Jenny and Naomi,
Thank you for your replies and code (Jenny). I did not mean to imply
that I would throw out probesets if they are present in only one
phenotype, but that I would keep in probesets if the same actual number
were counted present across all samples.
Say, for example, we have 10 samples with 5 in each phenotype. We decide
that we would like to pass through our filter any probeset that is
present in at least 80% of probesets (4) within a phenotype. I would
argue that to absolutely reduce bias we should construct the filter so
that we pass through probesets that are present in 4 out of 10 samples.
This is actually a more generous filter but would seem to better
preserve the underlying statistical distribution of data, which the BH
FDR method depends on.
I do recognize that, by passing a few more genes through the filter,
that we will end up raising the calculated FDR of all probesets tested
and that, thereby, we may end up with fewer significant probesets. But,
would we have more confidence in those?
I also recognize that these effects will be subtle and perhaps have
little practical effect, but I do want to be rigorous in my approach.
Your responses?
Thanks,
Mark
Mark W. Kimpel MD
(317) 490-5129 Work, & Mobile
(317) 663-0513 Home (no voice mail please)
1-(317)-536-2730 FAX
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny
Drnevich
Sent: Thursday, October 26, 2006 10:42 AM
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] advice on absent present filtering needed
I concur - You do not want to throw out genes that only express in one
phenotype. If you plot a histogram of the number of present calls for
each
gene, you will see that the vast majority of genes are either present in
all samples or absent in all samples. It is only the small number of
genes
in between that your filter options will affect. To be conservative, I
keep
a gene even if it is present in only 1 sample, so I don't even consider
phenotype. The difference really will only affect a few hundred genes,
which won't matter too much in terms of fdr correction, so I say be
conservative so you don't throw out a gene that is expressed in only one
phenotype. To check the histogram:
calls.eset <- mas5calls(abatch)
hist(apply(exprs(calls.eset), 1, function(x) sum(x=="P")))
Cheers,
Jenny
At 08:27 AM 10/26/2006, you wrote:
>Your colleague is right. Surely it is important to know if some
>genes express only in certain phenotypes. Your method loses this
information.
>
>--Naomi
>
>
>At 10:53 PM 10/25/2006, Kimpel, Mark William wrote:
> >I have a question about how to properly apply the MAS5 absent
> >present filtering technique. Within my group, I am advocating
> >setting a cutoff ratio of absent present across phenotypes (i.e. all
> >samples), whereas a colleague is advocating applying the filter
> >within phenotype and passing through the filter any probeset with
> >the A/P ratio of >0.5 within any of the phenotypes (we have 3).
> >
> >The argument my colleague makes is that some probesets may only be
> >expressed by one phenotype and we want to keep these in, but be
> >stringent within phenotype. This makes some biologic sense, but I am
> >concerned that this filtering within phenotype will introduce bias
> >as low expression levels, as it would seem to, at least in some
> >cases, act like a fold filter at expression levels near the limit of
> >reliable detection.
> >
> >Advice?
> >
> >Mark
> >
> >Mark W. Kimpel MD
> >
> >
> >Official Business Address:
> >
> >Department of Psychiatry
> >Indiana University School of Medicine
> >PR M116
> >Institute of Psychiatric Research
> >791 Union Drive
> >Indianapolis, IN 46202
> >
> >Preferred Mailing Address:
> >
> >15032 Hunter Court
> >Westfield, IN 46074
> >
> >(317) 490-5129 Work, & Mobile
> >
> >(317) 663-0513 Home (no voice mail please)
> >1-(317)-536-2730 FAX
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>Naomi S. Altman 814-865-3791 (voice)
>Associate Professor
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348 (Statistics)
>University Park, PA 16802-2111
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list