[BioC] Some Genefilter questions
Amy Mikhail
a.mikhail at abdn.ac.uk
Thu Nov 30 19:07:09 CET 2006
Hi Robert and Jim,
Many thanks for your advice. I have some more questions...
First, I tried what Robert suggested on my expression set. However I got
a strange result:
> load("E:\\Amy - Bioconductor analysis\\03. Base age\\Affymetrix - Base
Age results & analysis\\Baseage - RMA normalised.RData")
> ls()
[1] "Data" "eset" "phenodata" "x" "xy" "y"
> parasites = grep("^Pf", featureNames(eset))
> parasites
[1] 18192 18193 18194 18195 18196 18197 18198 18199 18200 18201 18202
18203
[13] 18204 18205 18206 18207 18208 18209 18210 18211 18212 18213 18214
18215
[25] 18216 18217 18218 18219 18220 18221 18222 18223 18224 18225 18226
18227 ### this list continues untill no. 4,514 ###
I was expexting the parasite affy IDs to be listed here, but these are (I
think) the probeset numbers (I can't tell if they are the right ones or
not...)?
> mossie.sub = eset[!parasites,]
> mossie.sub
Expression Set (exprSet) with
0 genes
6 samples
phenoData object with 3 variables and 6 cases
varLabels
Name: short name of datasets for graphs
Population: Age of adult mosquitoes (in days) included in
the sample
Replicate: Replicate number of the experiment
So now it has removed all the genes... I don't understand why this would
happen since the subset called "parasites" only contains a fraction of the
total number of probesets (4,514 out of 22,769).
Next, I wanted to try Jim's suggestion on the raw data. I can follow
Jenny's post up to:
" all you need now is your affybatch object, and a character vector of
probe set names"
I have an affybatch object, but how do I create a character vector for the
probesets I want to remove?
I'm still not very R-literate, so tried using the same code as previous
except with the raw data instead of my expression set but the
"featureNames" bit was a problem:
> parasites = grep("^Pf", featureNames(data))
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "featureNames",
for signature "function"
Any ideas?
Regards,
Amy
---------------------------------------------------------------------------
> Hi Amy,
>
> Amy Mikhail wrote:
>> Dear Bioconductors,
>>
>> I am annalysing 6 PlasmodiumAnopheles genechips, which have only
>> Anopheles
>> mosquito samples hybridised to them (i.e. they are not infected
>> mosquitoes). The 6 chips include 3 replicates, each consisting of two
>> time points. The design matrix is as follows:
>>
>>
>>>design
>>
>> M15d M43d
>> [1,] 1 0
>> [2,] 0 1
>> [3,] 1 0
>> [4,] 0 1
>> [5,] 1 0
>> [6,] 0 1
>>
>>
>> I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in
>> affy).
>> Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and
>> 0
>> DE genes, respectively... much less than I was expecting.
>>
>> As this affy chip contains probesets for both mosquito and malaria
>> parasite genes, I am wondering:
>>
>> (a) if it is better to remove all the parasite probesets before my
>> analysis;
>
> Probably. It's not the easiest thing to do. Here is a link to some code
> you can use:
>
> http://article.gmane.org/gmane.science.biology.informatics.conductor/9869/match=remove+probes+cdf
>
> Read what Ariel and Jenny write there very closely so you don't make
> mistakes.
>
>>
>> (b) if so at what stage I should do this (before or after normalisation
>> and background correction, or does it matter?)
>
> Before doing anything, most likely, which is what the above code will do
> for you.
>
>>
>> (c) how would I filter out these probesets using genefilter (all the
>> parasite affy IDs begin with Pf. - could I use this prefix in the affy
>> IDs
>> to filter out the probesets, and if so how?)
>>
>> Secondly, I did not add any of the polyA controls to my samples. I
>> would
>> like to know:
>>
>> (d) Do any of the bg correct / normalisation methods I tried utilise
>> affymetrix control probesets, and if so, how?
>
> No.
>
>>
>> (e) Should I also filter out the control sets - again, if so at what
>> stage
>> in the analysis and what would be an appropriate code to use?
>
> No, there aren't enough of them to have an effect on your data.
>
>>
>> I did try the code for non-specific filtering (on my RMA dataset) from
>> pg.
>> 232 of the bioconductor monograph, but the reduction in the number of
>> probesets was quite drastic;
>>
>>
>>>f1 <- pOverA(0.25, log2(100))
>>>f2 <- function(x) (IQR(x) > 0.5)
>>>ff <- filterfun(f1, f2)
>>>selected <- genefilter(Baseage.transformed, ff)
>>>sum(selected)
>>
>> [1] 404 ###(The origninal no. of probesets is 22,726)###
>>
>>>Baseage.sub <- Baseage.transformed[selected, ]
>>
>>
>> Also, I understood from the monograph that "100" was to filter out
>> fluorescence intensities less than this, but I am not clear if this is
>> from raw intensities or log2 values?
>
> It has to be data on the natural scale. The intensities for an Affy chip
> come from a 16-bit TIFF image, which means the brightest value can be
> 2^16, which in log2 scale is 16, so you cannot even have a value that
> approaches 100 on the log scale.
>
> Best,
>
> Jim
>
>
>
>>
>> All the parasite probesets have raw intensities <35 .... so could I
>> apply
>> this as a simple filter, and would this have to be on raw (rather than
>> normalised data)?
>>
>> Appologies for the long posting...
>>
>> Looking forward to any replies,
>> Regards,
>> Amy
>>
>>
>>>sessionInfo()
>>
>> R version 2.4.0 (2006-10-03)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] "tcltk" "splines" "tools" "methods" "stats"
>> "graphics" "grDevices" "utils" "datasets" "base"
>>
>> other attached packages:
>> plasmodiumanophelescdf tkWidgets DynDoc
>> widgetTools agahomology
>> "1.14.0" "1.12.0" "1.12.0"
>> "1.10.0" "1.14.2"
>> affyPLM gcrma matchprobes
>> affydata annaffy
>> "1.10.0" "2.6.0" "1.6.0"
>> "1.10.0" "1.6.0"
>> KEGG GO limma
>> geneplotter annotate
>> "1.14.0" "1.14.0" "2.9.1"
>> "1.12.0" "1.12.0"
>> affy affyio genefilter
>> survival Biobase
>> "1.12.0" "1.2.0" "1.12.0"
>> "2.29" "1.12.0"
>>
>>
>>
>> -------------------------------------------
>> Amy Mikhail
>> Research student
>> University of Aberdeen
>> Zoology Building
>> Tillydrone Avenue
>> Aberdeen AB24 2TZ
>> Scotland
>> Email: a.mikhail at abdn.ac.uk
>> Phone: 00-44-1224-272880 (lab)
>> 00-44-1224-273256 (office)
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not
> be used for urgent or sensitive issues.
>
-------------------------------------------
Amy Mikhail
Research student
University of Aberdeen
Zoology Building
Tillydrone Avenue
Aberdeen AB24 2TZ
Scotland
Email: a.mikhail at abdn.ac.uk
Phone: 00-44-1224-272880 (lab)
00-44-1224-273256 (office)
More information about the Bioconductor
mailing list