[BioC] advice on absent present filtering needed

Kimpel, Mark William mkimpel at iupui.edu
Thu Oct 26 17:17:27 CEST 2006


Jenny and Naomi,

Thank you for your replies and code (Jenny). I did not mean to imply
that I would throw out probesets if they are present in only one
phenotype, but that I would keep in probesets if the same actual number
were counted present across all samples.

Say, for example, we have 10 samples with 5 in each phenotype. We decide
that we would like to pass through our filter any probeset that is
present in at least 80% of probesets (4) within a phenotype. I would
argue that to absolutely reduce bias we should construct the filter so
that we pass through probesets that are present in 4 out of 10 samples.

This is actually a more generous filter but would seem to better
preserve the underlying statistical distribution of data, which the BH
FDR method depends on.

I do recognize that, by passing a few more genes through the filter,
that we will end up raising the calculated FDR of all probesets tested
and that, thereby, we may end up with fewer significant probesets. But,
would we have more confidence in those?

I also recognize that these effects will be subtle and perhaps have
little practical effect, but I do want to be rigorous in my approach.

Your responses?

Thanks,

Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work, & Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny
Drnevich
Sent: Thursday, October 26, 2006 10:42 AM
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] advice on absent present filtering needed

I concur - You do not want to throw out genes that only express in one 
phenotype. If you plot a histogram of the number of present calls for
each 
gene, you will see that the vast majority of genes are either present in

all samples or absent in all samples. It is only the small number of
genes 
in between that your filter options will affect. To be conservative, I
keep 
a gene even if it is present in only 1 sample, so I don't even consider 
phenotype. The difference really will only affect a few hundred genes, 
which won't matter too much in terms of fdr correction, so I say be 
conservative so you don't throw out a gene that is expressed in only one

phenotype. To check the histogram:

calls.eset <- mas5calls(abatch)

hist(apply(exprs(calls.eset), 1, function(x)  sum(x=="P")))

Cheers,
Jenny

At 08:27 AM 10/26/2006, you wrote:
>Your colleague is right.  Surely it is important to know if some
>genes express only in certain phenotypes.  Your method loses this
information.
>
>--Naomi
>
>
>At 10:53 PM 10/25/2006, Kimpel, Mark William wrote:
> >I have a question about how to properly apply the MAS5 absent
> >present filtering technique. Within my group, I am advocating
> >setting a cutoff ratio of absent present across phenotypes (i.e. all
> >samples), whereas a colleague is advocating applying the filter
> >within phenotype and passing through the filter any probeset with
> >the A/P ratio of >0.5 within any of the phenotypes (we have 3).
> >
> >The argument my colleague makes is that some probesets may only be
> >expressed by one phenotype and we want to keep these in, but be
> >stringent within phenotype. This makes some biologic sense, but I am
> >concerned that this filtering within phenotype will introduce bias
> >as low expression levels, as it would seem to, at least in some
> >cases, act like a fold filter at expression levels near the limit of
> >reliable detection.
> >
> >Advice?
> >
> >Mark
> >
> >Mark W. Kimpel MD
> >
> >
> >Official Business Address:
> >
> >Department of Psychiatry
> >Indiana University School of Medicine
> >PR M116
> >Institute of Psychiatric Research
> >791 Union Drive
> >Indianapolis, IN 46202
> >
> >Preferred Mailing Address:
> >
> >15032 Hunter Court
> >Westfield, IN  46074
> >
> >(317) 490-5129 Work, & Mobile
> >
> >(317) 663-0513 Home (no voice mail please)
> >1-(317)-536-2730 FAX
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>Naomi S. Altman                                814-865-3791 (voice)
>Associate Professor
>Dept. of Statistics                              814-863-7114 (fax)
>Penn State University                         814-865-1348 (Statistics)
>University Park, PA 16802-2111
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list