[BioC] Agi4x44PreProcess 1.4.0 question: use of genes.rpt.agi() and Gene Sets
pintarello at gmail.com
Fri Oct 23 00:42:12 CEST 2009
I must confess I had completely forgotten the issue of RNA splicing.
Shame on me.
Thank you very much for the key info that you provided and for the
conversation that you embarked on.
I am going back to this immediately.
On Wed, Oct 21, 2009 at 4:48 PM, Francois Pepin <fpepin at cs.mcgill.ca> wrote:
> The fact that you have to summarize with Affy doesn't mean that it applies
> to other technologies. The Affy chips need this because they have shorter
> oligos (25bp) but the Agilent ones are longer (60bp) and more reliable than
> individual affy probes.
> I have to disagree with that being the most biologically relevant. As I
> said, a lot of the probes for the same gene will not be measuring the same
> thing, some will be differential splice sites, or preferentially tracking
> pseudo-genes, etc. From talking to Agilent scientists, one of the criterias
> for keeping different probes for a same gene is that they give different
> readings on some of their test samples. Otherwise, they just take the
> closest one to 3'.
> I have cases where both probes for a given gene show differential expression
> in opposite directions. There's one I believe, the other one is a probably
> fluke, but combining them would have been be a bad idea.
> Tobias Straub wrote:
>> key question regarding your problem is the confidence in the measurement
>> of a single agilent feature. in affy 3' expression arrays a robust
>> measurement is obtained by summarization of several features. for the modern
>> affy gene st arrays the gene-based expression measurement is also obtained
>> by feature summarization across exons (at least this is what the affy
>> epxression console forces you to do).
>> hence, the most intuitive and biologically relevant procedure would be to
>> apply feature summarization accordingly for agilent arrays before doing the
>> statistics. the question how this summarization has to be done cannot easily
>> be answered without analysis of reference samples. my personal experience:
>> there is not a big difference between taking the median signal or just
>> taking the feature with the highest variance. if you are particularly
>> interested in categorizing responders, the variance method is probably more
>> On Oct 20, 2009, at 4:45 PM, Francois Pepin wrote:
>>> Hi Massimo,
>>> I don't know about Agi4x44PreProcess, but Limma can do it with avereps.
>>> In the case of Agilent arrays, I would not recommend doing that from the
>>> start. The probes mapping to the same genes often do not measure the same
>>> thing, they can map different splice variants and some can be pretty far
>>> from the 3' end.
>>> So for differential analysis, I would suggest keeping them different. For
>>> other analyses that assume one probe per gene, such as gene ontology
>>> analysis, I would recommend an unbiased method to choose a representative
>>> probe per gene, for example the highest variance probe or the one closest to
>>> 3' end.
>>> If you search in the archives, you can find more advice as this is a
>>> common topic.
>>> Massimo Pinto wrote:
>>>> Greetings all,
>>>> I realised that I was carrying forward, in my analysis, multiple
>>>> measurements for the same gene that had been carried out using
>>>> independent probes. This is a feature of Agilent arrays, as I
>>>> understand. However, while it is clear to me that Agi4x44PreProcess
>>>> offers a function to summarize replicated probes, called
>>>> summarize.probe(), I cannot see a readily available function that
>>>> performs a similar treatment to replicated genes, i.e. Gene Sets, as
>>>> these are called in the Agi4x44 Package.
>>>> The result of calling
>>>>> genes.rpt.agi(dd, "hgug4112a.db", raw.data = TRUE, WRITE.html = TRUE,
>>>>> REPORT = TRUE)
>>>> is an html list of Gene Sets, but these are not summarized to a
>>>> 'virtual' measurement, like summarize.probe() does for replicated
>>>> Is there a reason why one would like to carry on multiple probes for a
>>>> given gene throughout his/her subsequent analysis, including linear
>>>> modeling and gene ontology? If not, is there a function that performs
>>>> the median of such repeats?
>>>> Thank you in advance,
>>>> Massimo Pinto
>>>> R version 2.9.1 (2009-06-26)
>>>> attached base packages:
>>>>  grid stats graphics grDevices utils datasets
>>>> methods base
>>>> other attached packages:
>>>>  affy_1.22.0 gplots_2.7.0 caTools_1.9
>>>> bitops_1.0-4.1 gdata_2.4.2 gtools_2.5.0-1
>>>>  hgug4112a.db_2.2.11 RSQLite_0.7-1 DBI_0.2-4
>>>> Agi4x44PreProcess_1.4.0 genefilter_1.24.0 annotate_1.22.0
>>>>  AnnotationDbi_1.6.0 limma_2.18.0 Biobase_2.4.1
>>>> loaded via a namespace (and not attached):
>>>>  affyio_1.11.3 preprocessCore_1.5.3 splines_2.9.1
>>>> survival_2.35-4 xtable_1.5-5
>>>> Massimo Pinto
>>>> Post Doctoral Research Fellow
>>>> Enrico Fermi Centre and Italian Public Health Research Institute (ISS),
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> Search the archives:
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> Search the archives:
>> Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, München D
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
More information about the Bioconductor