[BioC] Agi4x44PreProcess 1.4.0 question: use of genes.rpt.agi() and Gene Sets

Fri Oct 23 00:42:12 CEST 2009

I must confess I had completely forgotten the issue of RNA splicing.
Shame on me.
Thank you very much for the key info that you provided and for the
conversation that you embarked on.
I am going back to this immediately.
Yours,
Massimo

On Wed, Oct 21, 2009 at 4:48 PM, Francois Pepin <fpepin at cs.mcgill.ca> wrote:
> The fact that you have to summarize with Affy doesn't mean that it applies
> to other technologies. The Affy chips need this because they have shorter
> oligos (25bp) but the Agilent ones are longer (60bp) and more reliable than
> individual affy probes.
>
> I have to disagree with that being the most biologically relevant. As I
> said, a lot of the probes for the same gene will not be measuring the same
> thing, some will be differential splice sites, or preferentially tracking
> pseudo-genes, etc. From talking to Agilent scientists, one of the criterias
> for keeping different probes for a same gene is that they give different
> readings on some of their test samples. Otherwise, they just take the
> closest one to 3'.
>
> I have cases where both probes for a given gene show differential expression
> in opposite directions. There's one I believe, the other one is a probably
> fluke, but combining them would have been be a bad idea.
>
> Francois
>
> Tobias Straub wrote:
>>
>> Hi
>>
>> key question regarding your problem is the confidence in the measurement
>> of a single agilent feature. in affy 3' expression arrays a robust
>> measurement is obtained by summarization of several features. for the modern
>> affy gene st arrays the gene-based expression measurement is also obtained
>> by feature summarization across exons (at least this is what the affy
>> epxression console forces you to do).
>>
>> hence, the most intuitive and biologically relevant procedure would be to
>> apply feature summarization accordingly for agilent arrays before doing the
>> statistics. the question how this summarization has to be done cannot easily
>> be answered without analysis of reference samples. my personal experience:
>> there is not a big difference between taking the median signal or just
>> taking the feature with the highest variance. if you are particularly
>> interested in categorizing responders, the variance method is probably more
>> sensitive.
>>
>> best
>> Tobias
>>
>> On Oct 20, 2009, at 4:45 PM, Francois Pepin wrote:
>>
>>> Hi Massimo,
>>>
>>> I don't know about Agi4x44PreProcess, but Limma can do it with avereps.
>>>
>>> In the case of Agilent arrays, I would not recommend doing that from the
>>> start. The probes mapping to the same genes often do not measure the same
>>> thing, they can map different splice variants and some can be pretty far
>>> from the 3' end.
>>>
>>> So for differential analysis, I would suggest keeping them different. For
>>> other analyses that assume one probe per gene, such as gene ontology
>>> analysis, I would recommend an unbiased method to choose a representative
>>> probe per gene, for example the highest variance probe or the one closest to
>>> 3' end.
>>>
>>> If you search in the archives, you can find more advice as this is a
>>> common topic.
>>>
>>> Francois
>>>
>>> Massimo Pinto wrote:
>>>>
>>>> Greetings all,
>>>> I realised that I was carrying forward, in my analysis, multiple
>>>> measurements for the same gene that had been carried out using
>>>> independent probes. This is a feature of Agilent arrays, as I
>>>> understand. However, while it is clear to me that Agi4x44PreProcess
>>>> offers a function to summarize replicated probes, called
>>>> summarize.probe(), I cannot see a readily available function that
>>>> performs a similar treatment to replicated genes, i.e. Gene Sets, as
>>>> these are called in the Agi4x44 Package.
>>>> The result of calling
>>>>>
>>>>> genes.rpt.agi(dd, "hgug4112a.db", raw.data = TRUE, WRITE.html = TRUE,
>>>>> REPORT = TRUE)
>>>>
>>>> is an html list of Gene Sets, but these are not summarized to a
>>>> 'virtual' measurement, like summarize.probe() does for replicated
>>>> probes.
>>>> Is there a reason why one would like to carry on multiple probes for a
>>>> given gene throughout his/her subsequent analysis, including linear
>>>> modeling and gene ontology? If not, is there a function that performs
>>>> the median of such repeats?
>>>> Thank you in advance,
>>>> Yours
>>>> Massimo Pinto
>>>>>
>>>>> sessionInfo()
>>>>
>>>> R version 2.9.1 (2009-06-26)
>>>> i386-apple-darwin8.11.1
>>>> locale:
>>>> C
>>>> attached base packages:
>>>> [1] grid      stats     graphics  grDevices utils     datasets
>>>> methods   base
>>>> other attached packages:
>>>> [1] affy_1.22.0             gplots_2.7.0            caTools_1.9
>>>>     bitops_1.0-4.1          gdata_2.4.2             gtools_2.5.0-1
>>>> [7] hgug4112a.db_2.2.11     RSQLite_0.7-1           DBI_0.2-4
>>>>     Agi4x44PreProcess_1.4.0 genefilter_1.24.0       annotate_1.22.0
>>>> [13] AnnotationDbi_1.6.0     limma_2.18.0            Biobase_2.4.1
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.11.3        preprocessCore_1.5.3 splines_2.9.1
>>>> survival_2.35-4      xtable_1.5-5
>>>> Massimo Pinto
>>>> Post Doctoral Research Fellow
>>>> Enrico Fermi Centre and Italian Public Health Research Institute (ISS),
>>>> Rome
>>>> http://claimid.com/massimopinto
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> ----------------------------------------------------------------------
>> Tobias Straub   ++4989218075439   Adolf-Butenandt-Institute, München D
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>