[BioC] Some Genefilter questions
Robert Gentleman
rgentlem at fhcrc.org
Thu Nov 30 00:15:22 CET 2006
Hi,
Amy Mikhail wrote:
> Dear Bioconductors,
>
> I am annalysing 6 PlasmodiumAnopheles genechips, which have only Anopheles
> mosquito samples hybridised to them (i.e. they are not infected
> mosquitoes). The 6 chips include 3 replicates, each consisting of two
> time points. The design matrix is as follows:
>
>> design
> M15d M43d
> [1,] 1 0
> [2,] 0 1
> [3,] 1 0
> [4,] 0 1
> [5,] 1 0
> [6,] 0 1
>
>
> I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in affy).
> Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and 0
> DE genes, respectively... much less than I was expecting.
>
> As this affy chip contains probesets for both mosquito and malaria
> parasite genes, I am wondering:
>
> (a) if it is better to remove all the parasite probesets before my analysis;
Yes, if you don't intend to use them, and they are not relevant to
your analysis. There is no point in doing p-value corrections for tests
you know are not interesting/relevant a priori.
>
> (b) if so at what stage I should do this (before or after normalisation
> and background correction, or does it matter?)
After both and prior to analysis - otherwise you are likely to need to
do some serious tweaking of the normalization code.
>
> (c) how would I filter out these probesets using genefilter (all the
> parasite affy IDs begin with Pf. - could I use this prefix in the affy IDs
> to filter out the probesets, and if so how?)
you don't need genefilter at all, this is a subseting problem.
If you had an ExpressionSet you would do something like:
parasites = grep("^Pf", featureNames(myExpressionSet))
mySubset = myExpressionSet[!parasites,]
>
> Secondly, I did not add any of the polyA controls to my samples. I would
> like to know:
>
> (d) Do any of the bg correct / normalisation methods I tried utilise
> affymetrix control probesets, and if so, how?
I doubt it.
>
> (e) Should I also filter out the control sets - again, if so at what stage
> in the analysis and what would be an appropriate code to use?
>
same place as you filter the parasite genes and pretty much in the
same way. They are likely to start with AFFX.
> I did try the code for non-specific filtering (on my RMA dataset) from pg.
> 232 of the bioconductor monograph, but the reduction in the number of
> probesets was quite drastic;
>
>> f1 <- pOverA(0.25, log2(100))
>> f2 <- function(x) (IQR(x) > 0.5)
that is a typo in the text - you probably want to filter out those
with IQR below the median, not for some fixed value.
>> ff <- filterfun(f1, f2)
>> selected <- genefilter(Baseage.transformed, ff)
>> sum(selected)
> [1] 404 ###(The origninal no. of probesets is 22,726)###
>> Baseage.sub <- Baseage.transformed[selected, ]
>
> Also, I understood from the monograph that "100" was to filter out
> fluorescence intensities less than this, but I am not clear if this is
> from raw intensities or log2 values?
raw - 100 on the log2 scale is larger than can be represented in the
image file formats used. And don't do that - it is not a good idea -
filter on variability.
>
> All the parasite probesets have raw intensities <35 .... so could I apply
> this as a simple filter, and would this have to be on raw (rather than
> normalised data)?
Best wishes
Robert
>
> Appologies for the long posting...
>
> Looking forward to any replies,
> Regards,
> Amy
>
>> sessionInfo()
> R version 2.4.0 (2006-10-03)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "tcltk" "splines" "tools" "methods" "stats"
> "graphics" "grDevices" "utils" "datasets" "base"
>
> other attached packages:
> plasmodiumanophelescdf tkWidgets DynDoc
> widgetTools agahomology
> "1.14.0" "1.12.0" "1.12.0"
> "1.10.0" "1.14.2"
> affyPLM gcrma matchprobes
> affydata annaffy
> "1.10.0" "2.6.0" "1.6.0"
> "1.10.0" "1.6.0"
> KEGG GO limma
> geneplotter annotate
> "1.14.0" "1.14.0" "2.9.1"
> "1.12.0" "1.12.0"
> affy affyio genefilter
> survival Biobase
> "1.12.0" "1.2.0" "1.12.0"
> "2.29" "1.12.0"
>
>
> -------------------------------------------
> Amy Mikhail
> Research student
> University of Aberdeen
> Zoology Building
> Tillydrone Avenue
> Aberdeen AB24 2TZ
> Scotland
> Email: a.mikhail at abdn.ac.uk
> Phone: 00-44-1224-272880 (lab)
> 00-44-1224-273256 (office)
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list