[BioC] Some Genefilter questions
Robert Gentleman
rgentlem at fhcrc.org
Thu Nov 30 19:21:37 CET 2006
Hi,
Amy Mikhail wrote:
> Hi Robert and Jim,
>
> Many thanks for your advice. I have some more questions...
>
> First, I tried what Robert suggested on my expression set. However I got
> a strange result:
>
>> load("E:\\Amy - Bioconductor analysis\\03. Base age\\Affymetrix - Base
> Age results & analysis\\Baseage - RMA normalised.RData")
>> ls()
> [1] "Data" "eset" "phenodata" "x" "xy" "y"
>
>> parasites = grep("^Pf", featureNames(eset))
>> parasites
> [1] 18192 18193 18194 18195 18196 18197 18198 18199 18200 18201 18202
> 18203
> [13] 18204 18205 18206 18207 18208 18209 18210 18211 18212 18213 18214
> 18215
> [25] 18216 18217 18218 18219 18220 18221 18222 18223 18224 18225 18226
> 18227 ### this list continues untill no. 4,514 ###
you can tell by using
featureNames(eset)[parasites]
the values in the parasites vector are the indices of the features
>
> I was expexting the parasite affy IDs to be listed here, but these are (I
> think) the probeset numbers (I can't tell if they are the right ones or
> not...)?
>
>> mossie.sub = eset[!parasites,]
oops - should have been
mossie.sub = eset[-parasites,]
my mistake - I keep thinking grep returns a logical vector for some
reason.
>> mossie.sub
> Expression Set (exprSet) with
> 0 genes
> 6 samples
> phenoData object with 3 variables and 6 cases
> varLabels
> Name: short name of datasets for graphs
> Population: Age of adult mosquitoes (in days) included in
> the sample
> Replicate: Replicate number of the experiment
>
> So now it has removed all the genes... I don't understand why this would
> happen since the subset called "parasites" only contains a fraction of the
> total number of probesets (4,514 out of 22,769).
>
> Next, I wanted to try Jim's suggestion on the raw data. I can follow
> Jenny's post up to:
>
> " all you need now is your affybatch object, and a character vector of
> probe set names"
>
> I have an affybatch object, but how do I create a character vector for the
> probesets I want to remove?
>
> I'm still not very R-literate, so tried using the same code as previous
> except with the raw data instead of my expression set but the
> "featureNames" bit was a problem:
>
>> parasites = grep("^Pf", featureNames(data))
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "featureNames",
> for signature "function"
>
> Any ideas?
>
> Regards,
>
> Amy
>
> ---------------------------------------------------------------------------
>
>> Hi Amy,
>>
>> Amy Mikhail wrote:
>>> Dear Bioconductors,
>>>
>>> I am annalysing 6 PlasmodiumAnopheles genechips, which have only
>>> Anopheles
>>> mosquito samples hybridised to them (i.e. they are not infected
>>> mosquitoes). The 6 chips include 3 replicates, each consisting of two
>>> time points. The design matrix is as follows:
>>>
>>>
>>>> design
>>> M15d M43d
>>> [1,] 1 0
>>> [2,] 0 1
>>> [3,] 1 0
>>> [4,] 0 1
>>> [5,] 1 0
>>> [6,] 0 1
>>>
>>>
>>> I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in
>>> affy).
>>> Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and
>>> 0
>>> DE genes, respectively... much less than I was expecting.
>>>
>>> As this affy chip contains probesets for both mosquito and malaria
>>> parasite genes, I am wondering:
>>>
>>> (a) if it is better to remove all the parasite probesets before my
>>> analysis;
>> Probably. It's not the easiest thing to do. Here is a link to some code
>> you can use:
>>
>> http://article.gmane.org/gmane.science.biology.informatics.conductor/9869/match=remove+probes+cdf
>>
>> Read what Ariel and Jenny write there very closely so you don't make
>> mistakes.
>>
>>> (b) if so at what stage I should do this (before or after normalisation
>>> and background correction, or does it matter?)
>> Before doing anything, most likely, which is what the above code will do
>> for you.
>>
>>> (c) how would I filter out these probesets using genefilter (all the
>>> parasite affy IDs begin with Pf. - could I use this prefix in the affy
>>> IDs
>>> to filter out the probesets, and if so how?)
>>>
>>> Secondly, I did not add any of the polyA controls to my samples. I
>>> would
>>> like to know:
>>>
>>> (d) Do any of the bg correct / normalisation methods I tried utilise
>>> affymetrix control probesets, and if so, how?
>> No.
>>
>>> (e) Should I also filter out the control sets - again, if so at what
>>> stage
>>> in the analysis and what would be an appropriate code to use?
>> No, there aren't enough of them to have an effect on your data.
>>
>>> I did try the code for non-specific filtering (on my RMA dataset) from
>>> pg.
>>> 232 of the bioconductor monograph, but the reduction in the number of
>>> probesets was quite drastic;
>>>
>>>
>>>> f1 <- pOverA(0.25, log2(100))
>>>> f2 <- function(x) (IQR(x) > 0.5)
>>>> ff <- filterfun(f1, f2)
>>>> selected <- genefilter(Baseage.transformed, ff)
>>>> sum(selected)
>>> [1] 404 ###(The origninal no. of probesets is 22,726)###
>>>
>>>> Baseage.sub <- Baseage.transformed[selected, ]
>>>
>>> Also, I understood from the monograph that "100" was to filter out
>>> fluorescence intensities less than this, but I am not clear if this is
>>> from raw intensities or log2 values?
>> It has to be data on the natural scale. The intensities for an Affy chip
>> come from a 16-bit TIFF image, which means the brightest value can be
>> 2^16, which in log2 scale is 16, so you cannot even have a value that
>> approaches 100 on the log scale.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>> All the parasite probesets have raw intensities <35 .... so could I
>>> apply
>>> this as a simple filter, and would this have to be on raw (rather than
>>> normalised data)?
>>>
>>> Appologies for the long posting...
>>>
>>> Looking forward to any replies,
>>> Regards,
>>> Amy
>>>
>>>
>>>> sessionInfo()
>>> R version 2.4.0 (2006-10-03)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>> States.1252;LC_MONETARY=English_United
>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] "tcltk" "splines" "tools" "methods" "stats"
>>> "graphics" "grDevices" "utils" "datasets" "base"
>>>
>>> other attached packages:
>>> plasmodiumanophelescdf tkWidgets DynDoc
>>> widgetTools agahomology
>>> "1.14.0" "1.12.0" "1.12.0"
>>> "1.10.0" "1.14.2"
>>> affyPLM gcrma matchprobes
>>> affydata annaffy
>>> "1.10.0" "2.6.0" "1.6.0"
>>> "1.10.0" "1.6.0"
>>> KEGG GO limma
>>> geneplotter annotate
>>> "1.14.0" "1.14.0" "2.9.1"
>>> "1.12.0" "1.12.0"
>>> affy affyio genefilter
>>> survival Biobase
>>> "1.12.0" "1.2.0" "1.12.0"
>>> "2.29" "1.12.0"
>>>
>>>
>>>
>>> -------------------------------------------
>>> Amy Mikhail
>>> Research student
>>> University of Aberdeen
>>> Zoology Building
>>> Tillydrone Avenue
>>> Aberdeen AB24 2TZ
>>> Scotland
>>> Email: a.mikhail at abdn.ac.uk
>>> Phone: 00-44-1224-272880 (lab)
>>> 00-44-1224-273256 (office)
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Affymetrix and cDNA Microarray Core
>> University of Michigan Cancer Center
>> 1500 E. Medical Center Drive
>> 7410 CCGC
>> Ann Arbor MI 48109
>> 734-647-5623
>>
>>
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should not
>> be used for urgent or sensitive issues.
>>
>
>
> -------------------------------------------
> Amy Mikhail
> Research student
> University of Aberdeen
> Zoology Building
> Tillydrone Avenue
> Aberdeen AB24 2TZ
> Scotland
> Email: a.mikhail at abdn.ac.uk
> Phone: 00-44-1224-272880 (lab)
> 00-44-1224-273256 (office)
>
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list