[BioC] Some Genefilter questions
Amy Mikhail
a.mikhail at abdn.ac.uk
Wed Nov 29 21:32:00 CET 2006
Dear Bioconductors,
I am annalysing 6 PlasmodiumAnopheles genechips, which have only Anopheles
mosquito samples hybridised to them (i.e. they are not infected
mosquitoes). The 6 chips include 3 replicates, each consisting of two
time points. The design matrix is as follows:
> design
M15d M43d
[1,] 1 0
[2,] 0 1
[3,] 1 0
[4,] 0 1
[5,] 1 0
[6,] 0 1
I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in affy).
Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and 0
DE genes, respectively... much less than I was expecting.
As this affy chip contains probesets for both mosquito and malaria
parasite genes, I am wondering:
(a) if it is better to remove all the parasite probesets before my analysis;
(b) if so at what stage I should do this (before or after normalisation
and background correction, or does it matter?)
(c) how would I filter out these probesets using genefilter (all the
parasite affy IDs begin with Pf. - could I use this prefix in the affy IDs
to filter out the probesets, and if so how?)
Secondly, I did not add any of the polyA controls to my samples. I would
like to know:
(d) Do any of the bg correct / normalisation methods I tried utilise
affymetrix control probesets, and if so, how?
(e) Should I also filter out the control sets - again, if so at what stage
in the analysis and what would be an appropriate code to use?
I did try the code for non-specific filtering (on my RMA dataset) from pg.
232 of the bioconductor monograph, but the reduction in the number of
probesets was quite drastic;
> f1 <- pOverA(0.25, log2(100))
> f2 <- function(x) (IQR(x) > 0.5)
> ff <- filterfun(f1, f2)
> selected <- genefilter(Baseage.transformed, ff)
> sum(selected)
[1] 404 ###(The origninal no. of probesets is 22,726)###
> Baseage.sub <- Baseage.transformed[selected, ]
Also, I understood from the monograph that "100" was to filter out
fluorescence intensities less than this, but I am not clear if this is
from raw intensities or log2 values?
All the parasite probesets have raw intensities <35 .... so could I apply
this as a simple filter, and would this have to be on raw (rather than
normalised data)?
Appologies for the long posting...
Looking forward to any replies,
Regards,
Amy
> sessionInfo()
R version 2.4.0 (2006-10-03)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] "tcltk" "splines" "tools" "methods" "stats"
"graphics" "grDevices" "utils" "datasets" "base"
other attached packages:
plasmodiumanophelescdf tkWidgets DynDoc
widgetTools agahomology
"1.14.0" "1.12.0" "1.12.0"
"1.10.0" "1.14.2"
affyPLM gcrma matchprobes
affydata annaffy
"1.10.0" "2.6.0" "1.6.0"
"1.10.0" "1.6.0"
KEGG GO limma
geneplotter annotate
"1.14.0" "1.14.0" "2.9.1"
"1.12.0" "1.12.0"
affy affyio genefilter
survival Biobase
"1.12.0" "1.2.0" "1.12.0"
"2.29" "1.12.0"
>
-------------------------------------------
Amy Mikhail
Research student
University of Aberdeen
Zoology Building
Tillydrone Avenue
Aberdeen AB24 2TZ
Scotland
Email: a.mikhail at abdn.ac.uk
Phone: 00-44-1224-272880 (lab)
00-44-1224-273256 (office)
More information about the Bioconductor
mailing list