[BioC] Some Genefilter questions
Robert Gentleman
rgentlem at fhcrc.org
Thu Nov 30 19:12:00 CET 2006
Hi,
Lourdusamy A Anbarasu wrote:
> Dear Dr. Robert,
>
> You have mentioned that the filtering on the variability is preferred
> than raw intensity value. I have also read your previous post on this
> issue. For filters based on CV, are there any recommended cut-off values?
Not really. A widely held, but AFAIK undocumented, belief is that in
any given tissue/cell about 40% of the genome is expressed at any time.
So, I usually choose the median - that is somewhat conservative with
respect to the above cited statistic - but this is a personal
preference. I have not seen any research (and I think it would be hard).
best wishes
Robert
>
> Thanks in advance.
>
> Best regards,
> Anbarasu
>
> On 11/30/06, *Robert Gentleman* <rgentlem at fhcrc.org
> <mailto:rgentlem at fhcrc.org> > wrote:
>
> Hi,
>
> Amy Mikhail wrote:
> > Dear Bioconductors,
> >
> > I am annalysing 6 PlasmodiumAnopheles genechips, which have only
> Anopheles
> > mosquito samples hybridised to them (i.e. they are not infected
> > mosquitoes). The 6 chips include 3 replicates, each consisting
> of two
> > time points. The design matrix is as follows:
> >
> >> design
> > M15d M43d
> > [1,] 1 0
> > [2,] 0 1
> > [3,] 1 0
> > [4,] 0 1
> > [5,] 1 0
> > [6,] 0 1
> >
> >
> > I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5
> (in affy).
> > Looking at the (BH) adjusted p values <0.05, this gave me 2, 12,
> 0 and 0
> > DE genes, respectively... much less than I was expecting.
> >
> > As this affy chip contains probesets for both mosquito and malaria
> > parasite genes, I am wondering:
> >
> > (a) if it is better to remove all the parasite probesets before
> my analysis;
>
> Yes, if you don't intend to use them, and they are not relevant to
> your analysis. There is no point in doing p-value corrections for tests
> you know are not interesting/relevant a priori.
>
> >
> > (b) if so at what stage I should do this (before or after
> normalisation
> > and background correction, or does it matter?)
>
> After both and prior to analysis - otherwise you are likely to
> need to
> do some serious tweaking of the normalization code.
>
> >
> > (c) how would I filter out these probesets using genefilter (all the
> > parasite affy IDs begin with Pf. - could I use this prefix in the
> affy IDs
> > to filter out the probesets, and if so how?)
>
> you don't need genefilter at all, this is a subseting problem.
> If you had an ExpressionSet you would do something like:
>
> parasites = grep("^Pf", featureNames(myExpressionSet))
>
> mySubset = myExpressionSet[!parasites,]
>
> >
> > Secondly, I did not add any of the polyA controls to my
> samples. I would
> > like to know:
> >
> > (d) Do any of the bg correct / normalisation methods I tried utilise
> > affymetrix control probesets, and if so, how?
>
> I doubt it.
>
> >
> > (e) Should I also filter out the control sets - again, if so at
> what stage
> > in the analysis and what would be an appropriate code to use?
> >
>
> same place as you filter the parasite genes and pretty much in the
> same way. They are likely to start with AFFX.
>
> > I did try the code for non-specific filtering (on my RMA dataset)
> from pg.
> > 232 of the bioconductor monograph, but the reduction in the number of
> > probesets was quite drastic;
> >
> >> f1 <- pOverA(0.25, log2(100))
> >> f2 <- function(x) (IQR(x) > 0.5)
>
> that is a typo in the text - you probably want to filter out those
> with IQR below the median, not for some fixed value.
>
> >> ff <- filterfun(f1, f2)
> >> selected <- genefilter(Baseage.transformed , ff)
> >> sum(selected)
> > [1] 404 ###(The origninal no. of probesets is 22,726)###
> >> Baseage.sub <- Baseage.transformed[selected, ]
> >
> > Also, I understood from the monograph that "100" was to filter out
> > fluorescence intensities less than this, but I am not clear if
> this is
> > from raw intensities or log2 values?
>
> raw - 100 on the log2 scale is larger than can be represented in the
> image file formats used. And don't do that - it is not a good idea -
> filter on variability.
>
>
> >
> > All the parasite probesets have raw intensities <35 .... so could
> I apply
> > this as a simple filter, and would this have to be on raw (rather
> than
> > normalised data)?
>
>
> Best wishes
> Robert
>
> >
> > Appologies for the long posting...
> >
> > Looking forward to any replies,
> > Regards,
> > Amy
> >
> >> sessionInfo()
> > R version 2.4.0 (2006-10-03)
> > i386-pc-mingw32
> >
> > locale:
> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> > States.1252;LC_MONETARY=English_United
> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] "tcltk" "splines" "tools" "methods" "stats"
> > "graphics" "grDevices" "utils" "datasets" "base"
> >
> > other attached packages:
> > plasmodiumanophelescdf tkWidgets DynDoc
> > widgetTools agahomology
> > "1.14.0" " 1.12.0" "1.12.0"
> > "1.10.0" "1.14.2"
> > affyPLM gcrma matchprobes
> > affydata annaffy
> > "1.10.0" "2.6.0" "1.6.0"
> > "1.10.0" "1.6.0"
> > KEGG GO limma
> > geneplotter annotate
> > "1.14.0" "1.14.0" "2.9.1"
> > "1.12.0" "1.12.0"
> > affy affyio genefilter
> > survival Biobase
> > "1.12.0" "1.2.0" "1.12.0 "
> > "2.29" "1.12.0"
> >
> >
> > -------------------------------------------
> > Amy Mikhail
> > Research student
> > University of Aberdeen
> > Zoology Building
> > Tillydrone Avenue
> > Aberdeen AB24 2TZ
> > Scotland
> > Email: a.mikhail at abdn.ac.uk <mailto:a.mikhail at abdn.ac.uk>
> > Phone: 00-44-1224-272880 (lab)
> > 00-44-1224-273256 (office)
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> <mailto:Bioconductor at stat.math.ethz.ch>
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
> >
>
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org <mailto:rgentlem at fhcrc.org>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch <mailto:Bioconductor at stat.math.ethz.ch>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> --
> Lourdusamy A Anbarasu
> Dipartimento Medicina Sperimentale e Sanita Pubblica
> Via Scalzino 3
> 62032 Camerino (MC)
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list