[BioC] error preProcessGeneST ArrayTools (R 2.15.2, BioC 2.11)
José López
jose.lopez at umh.es
Thu Dec 20 23:39:18 CET 2012
Hi Jim,
Ok, thank you very much for your input.
Makes much more sense to filter after variability estimation.
Thanks again for your suggestion.
Jose
El dic 20, 2012, a las 11:18 p.m., James W. MacDonald escribió:
> Hi Jose,
>
> On 12/20/2012 4:55 PM, José López wrote:
>> Hi Jim,
>>
>> Thank you for your quick reply and your useful explanation.
>> In fact, I was using the function to subset the eset since I didn’t
>> know other way to do it (before).
>>
>> By the way, I would like to know your opinion about whether is a
>> good option to filter controls before doing moderate t-statistics
>> (limma) or should I rather perform statistic on the whole variable
>> dataset (including controls)?
>
> My understanding is that you should filter after fitting the model
> when using limma, because you will bias the empirical Bayes estimate
> if you filter first. So that is what I do with these arrays.
>
>>
>> I have read in that in contrast to standard t-statistic, common
>> filter/test pairs does not necessarily translate in power gains
>> when moderated t-statistics is performed (Bourgon, Gentleman and
>> Huber, 2009), so following their indications, I don’t use to apply
>> any filter on genes. The question is whether you think it is a good
>> practice to remove control probes (not genes) or not.
>
> I filter the control probes, mainly because they have a bad habit of
> popping up in a list of differentially expressed genes. This was
> much rarer with the 3' biased arrays, and way more obvious since
> those controls had a big AFFX appended to the probeset ID.
>
> What I generally see is that the intronic controls often appear to
> be differentially expressed. I can come up with several hypotheses
> as to why this is so (the primary one being that total mRNA will
> likely also include mRNA that hasn't yet been processed to remove
> the introns, so if one sample is more actively expressing a gene,
> you may well end up with introns being processed into cDNA and then
> hybridized to the chip).
>
> Regardless, these are supposed to be controls, and are not really
> annotated, and it is hard to explain when they pop up in lists of
> differentially expressed genes. So I take the easy way out and nuke
> them right after fitting the model.
>
> Best,
>
> Jim
>
>
>
>
>>
>> Sorry if the question is not clearly exposed.
>>
>> Thank you in advance for your time and your help.
>>
>> Best,
>>
>> Jose
>>
>> El dic 20, 2012, a las 7:00 p.m., James W. MacDonald escribió:
>>
>>> Hi Jose,
>>>
>>> On 12/20/2012 12:20 PM, José LÓPEZ wrote:
>>>> Dear all,
>>>>
>>>> I was trying to use preProcessGeneST ArrayTools to get rid of
>>>> control probes in Mouse Gene 1.0ST arrays, but it dosent work in
>>>> last R/BioC version.
>>>> It was working perfectly in previous R/BioC version. Do I
>>>> downgrade to previous version to continue to use the ArrayTools
>>>> package?
>>>> It is possible that the error has a different cause?
>>>
>>> This looks like a bug in the current version of the
>>> mogene10sttranscriptcluster.db package, as the MAP object appears
>>> to be missing:
>>>
>>> > ls(2)
>>> [1] "mogene10sttranscriptcluster"
>>> <snip>
>>> [20] "mogene10sttranscriptclusterGO2ALLPROBES"
>>> [21] "mogene10sttranscriptclusterGO2PROBE"
>>> [22] "mogene10sttranscriptclusterMAPCOUNTS"
>>> [23] "mogene10sttranscriptclusterMGI"
>>> <snip>
>>>
>>> So I don't think downgrading anything will help - we just need to
>>> rebuild this package.
>>>
>>> But this brings me to a different question. The function you are
>>> using is intended to annotate things and then output in the
>>> current directory, and removing control probes is just a side
>>> effect of one argument. So are you trying to annotate, or to
>>> remove control probes?
>>>
>>> If you just want to remove control probes, note that you can do
>>>
>>> > data(mogene10stCONTROL)
>>>
>>> and then you can subset your eset using the resulting data.frame:
>>>
>>> eset_no_control <- eset_norm[!featureNames(eset_norm) %in%
>>> mogene10stCONTROL$probeset_id,]
>>>
>>> Note the use of the bang (!) preceding featureNames - we want to
>>> remove these things, not select for them.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>>
>>>> Thank you for your kind help,
>>>>
>>>>> eset_process = preProcessGeneST(eset_norm, output = TRUE)
>>>> Warning message:
>>>> In chkPkgs(chip) :
>>>> The mogene10sttranscriptcluster.db package does not appear to
>>>> contain annotation data.
>>>> Error in function (x, envir, mode = "any", ifnotfound =
>>>> list(function(x) stop(paste0("value for '", :
>>>> error in evaluating the argument 'envir' in selecting a method
>>>> for function 'mget': Error: object
>>>> 'mogene10sttranscriptclusterMAP' not found
>>>>> sessionInfo()
>>>> R version 2.15.2 (2012-10-26)
>>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] mogene10stv1cdf_2.11.0 annaffy_1.30.0 KEGG.db_2.8.0
>>>> [4] GO.db_2.8.0 arrayQualityMetrics_3.14.0 ArrayTools_1.18.0
>>>> [7] mogene10sttranscriptcluster.db_8.0.1 org.Mm.eg.db_2.8.0
>>>> RSQLite_0.11.2
>>>> [10] DBI_0.2-5 affy_1.36.0 annotate_1.36.0
>>>> [13] AnnotationDbi_1.20.3 vsn_3.26.0 Biobase_2.18.0
>>>> [16] BiocGenerics_0.4.0 limma_3.14.3
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.26.0 affyPLM_1.34.0 beadarray_2.8.1
>>>> BeadDataPackR_1.10.0 BiocInstaller_1.8.3
>>>> [6] Biostrings_2.26.2 Cairo_1.5-2 cluster_1.14.3 colorspace_1.2-0
>>>> gcrma_2.30.0
>>>> [11] genefilter_1.40.0 grid_2.15.2 Hmisc_3.10-1 hwriter_1.3
>>>> IRanges_1.16.4
>>>> [16] lattice_0.20-10 latticeExtra_0.6-24 parallel_2.15.2 plyr_1.8
>>>> preprocessCore_1.20.0
>>>> [21] RColorBrewer_1.0-5 reshape2_1.2.2 setRNG_2011.11-2
>>>> splines_2.15.2 stats4_2.15.2
>>>> [26] stringr_0.6.2 survival_2.37-2 SVGAnnotation_0.93-1
>>>> tools_2.15.2 XML_3.95-0.1
>>>> [31] xtable_1.7-0 zlibbioc_1.4.0
>>>>> class(H2Bgfp_norm)
>>>> [1] "ExpressionSet"
>>>> attr(,"package")
>>>> [1] "Biobase"
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
More information about the Bioconductor
mailing list