[BioC] Some Genefilter questions
Amy Mikhail
a.mikhail at abdn.ac.uk
Thu Nov 30 19:33:07 CET 2006
Hi all,
I am curious to see how they compare too - as soon as I have the
subsetting and character vector sorted I will try both and let you know
how it turns out.
Out of interest - would it also be possible to carry out the background
correction on the full dataset, then remove the parasite probesets, then
normalise? (and how would one separate these functions in expresso or
AffyPLM, since there is a placeholder for bg.correct but not for
normalisation?)
Regards,
Amy
---------------------------------------------------------------------------
> Hi again,
>
> Some parts of my answer and of Jim's are in disagreement - it might be
> nice to hear other points of view here.
>
> The question is really whether there is anything to be gained by
> removing the probes (probesets) we know are not involved prior to
> normalization background correction or not.
>
> Clearly these probes will help with background correction, but they
> could substantially interfere with normalization. I don't personally
> thing (no evidence at all though) that this is a problem - but would
> love to see some quantitative comparisons of results that took both
> approaches to see if the end results are qualitatively different.
>
> best wishes
> Robert
>
>
> James W. MacDonald wrote:
>> Hi Amy,
>>
>> Amy Mikhail wrote:
>>> Dear Bioconductors,
>>>
>>> I am annalysing 6 PlasmodiumAnopheles genechips, which have only
>>> Anopheles
>>> mosquito samples hybridised to them (i.e. they are not infected
>>> mosquitoes). The 6 chips include 3 replicates, each consisting of two
>>> time points. The design matrix is as follows:
>>>
>>>
>>>> design
>>> M15d M43d
>>> [1,] 1 0
>>> [2,] 0 1
>>> [3,] 1 0
>>> [4,] 0 1
>>> [5,] 1 0
>>> [6,] 0 1
>>>
>>>
>>> I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in
>>> affy).
>>> Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and
>>> 0
>>> DE genes, respectively... much less than I was expecting.
>>>
>>> As this affy chip contains probesets for both mosquito and malaria
>>> parasite genes, I am wondering:
>>>
>>> (a) if it is better to remove all the parasite probesets before my
>>> analysis;
>>
>> Probably. It's not the easiest thing to do. Here is a link to some code
>> you can use:
>>
>> http://article.gmane.org/gmane.science.biology.informatics.conductor/9869/match=remove+probes+cdf
>>
>> Read what Ariel and Jenny write there very closely so you don't make
>> mistakes.
>>
>>> (b) if so at what stage I should do this (before or after normalisation
>>> and background correction, or does it matter?)
>>
>> Before doing anything, most likely, which is what the above code will do
>> for you.
>>
>>> (c) how would I filter out these probesets using genefilter (all the
>>> parasite affy IDs begin with Pf. - could I use this prefix in the affy
>>> IDs
>>> to filter out the probesets, and if so how?)
>>>
>>> Secondly, I did not add any of the polyA controls to my samples. I
>>> would
>>> like to know:
>>>
>>> (d) Do any of the bg correct / normalisation methods I tried utilise
>>> affymetrix control probesets, and if so, how?
>>
>> No.
>>
>>> (e) Should I also filter out the control sets - again, if so at what
>>> stage
>>> in the analysis and what would be an appropriate code to use?
>>
>> No, there aren't enough of them to have an effect on your data.
>>
>>> I did try the code for non-specific filtering (on my RMA dataset) from
>>> pg.
>>> 232 of the bioconductor monograph, but the reduction in the number of
>>> probesets was quite drastic;
>>>
>>>
>>>> f1 <- pOverA(0.25, log2(100))
>>>> f2 <- function(x) (IQR(x) > 0.5)
>>>> ff <- filterfun(f1, f2)
>>>> selected <- genefilter(Baseage.transformed, ff)
>>>> sum(selected)
>>> [1] 404 ###(The origninal no. of probesets is 22,726)###
>>>
>>>> Baseage.sub <- Baseage.transformed[selected, ]
>>>
>>> Also, I understood from the monograph that "100" was to filter out
>>> fluorescence intensities less than this, but I am not clear if this is
>>> from raw intensities or log2 values?
>>
>> It has to be data on the natural scale. The intensities for an Affy chip
>> come from a 16-bit TIFF image, which means the brightest value can be
>> 2^16, which in log2 scale is 16, so you cannot even have a value that
>> approaches 100 on the log scale.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>> All the parasite probesets have raw intensities <35 .... so could I
>>> apply
>>> this as a simple filter, and would this have to be on raw (rather than
>>> normalised data)?
>>>
>>> Appologies for the long posting...
>>>
>>> Looking forward to any replies,
>>> Regards,
>>> Amy
>>>
>>>
>>>> sessionInfo()
>>> R version 2.4.0 (2006-10-03)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>> States.1252;LC_MONETARY=English_United
>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] "tcltk" "splines" "tools" "methods" "stats"
>>> "graphics" "grDevices" "utils" "datasets" "base"
>>>
>>> other attached packages:
>>> plasmodiumanophelescdf tkWidgets DynDoc
>>> widgetTools agahomology
>>> "1.14.0" "1.12.0" "1.12.0"
>>> "1.10.0" "1.14.2"
>>> affyPLM gcrma matchprobes
>>> affydata annaffy
>>> "1.10.0" "2.6.0" "1.6.0"
>>> "1.10.0" "1.6.0"
>>> KEGG GO limma
>>> geneplotter annotate
>>> "1.14.0" "1.14.0" "2.9.1"
>>> "1.12.0" "1.12.0"
>>> affy affyio genefilter
>>> survival Biobase
>>> "1.12.0" "1.2.0" "1.12.0"
>>> "2.29" "1.12.0"
>>>
>>>
>>>
>>> -------------------------------------------
>>> Amy Mikhail
>>> Research student
>>> University of Aberdeen
>>> Zoology Building
>>> Tillydrone Avenue
>>> Aberdeen AB24 2TZ
>>> Scotland
>>> Email: a.mikhail at abdn.ac.uk
>>> Phone: 00-44-1224-272880 (lab)
>>> 00-44-1224-273256 (office)
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
>
-------------------------------------------
Amy Mikhail
Research student
University of Aberdeen
Zoology Building
Tillydrone Avenue
Aberdeen AB24 2TZ
Scotland
Email: a.mikhail at abdn.ac.uk
Phone: 00-44-1224-272880 (lab)
00-44-1224-273256 (office)
More information about the Bioconductor
mailing list