[BioC] request for simple usage of probe level normalisations

Mon Oct 8 20:20:08 CEST 2007

On Oct 8, 2007, at 2:30 AM, Ido M. Tamir wrote:

> Dear All,
>
> a) I don't know, if sequence based models like GCRMA, which I read  
> stands
> actually for "GeneChip (tm)" not GC content, can be extended to
> other platforms.
> I am just looking at single color agilent chips and there is a gc  
> content
> bias:
> log2(intensity)~gc percentage:
> Coefficients:  Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.023391   0.105189   9.729   <2e-16 ***
> gcp         0.121826   0.002503  48.666   <2e-16 ***

As you should know GCRMA actually consists of 3 steps: background  
correction, normalization and summarization. The last two steps are  
the same as in RMA and the third step (summarization) requires the  
concept of a probeset, ie. several probes targeting the same gene (or  
transcript or whatever you are trying to measure). It is not clear to  
me that Agilent arrays have probesets, although it of course depends  
on the design.

The background correction is really the thing where GCRMA uses probe  
sequence information. What the authors of GCRMA have done, is  
estimate some parameters related to 25-mer oligos in a reference  
experiment (well to be precise I remember it as a large pool of many  
experiments). These parameters are then _postulated_ to be relevant  
for all affy chips (with some justification).

As a minimum, if you want to use GCRMA on another platform you would  
need to do some kind of estimation of these parameters - especially  
if the other platform uses different length oligos, as Agilent does  
(although I guess you could get 25mer arrays from Agilent).

Then you would need to have some kind of spike in experiment to show  
that it really helps you on this other platform.

With such reference data it would not be hard to use the GCRMA  
algorithm for another chip - at least the background correction part.

> b) I know one should not request/ask open source deveoplers for  
> something
>
> but:
> if GCRMA is applicable to other platforms, then it would be nice if  
> it could
> be used in a simple way with these other platforms, and new  
> platforms for
> oligo chips are getting more and more common.
>
> I read the information from the oligo package and of the makePDpackage
> which seems to be superseeded in the future by the pdInfoBuilder.
>
> Would it be possible to make this simpler somehow? I don't know  
> exactly what
> information is actually needed by the downstream analysis with  
> GCRMA, but
> wouldn't it be sufficient that for the creation of a new  
> environment I would
> need just 2 simple tab delimited text files*. Then one could simply  
> make a
> script that converts ones own format (which are not .ndf or .cdf)  
> to this
> _simple_ tab delimited format whose specification is clearly  
> outlined in the
> package vignette.
>
> Maybe I am underestimating the complexity (ignoring spatial  
> information on
> chip) or its already there (yes, cdf etc.. files can be faked).

The pdInfo path taken by oligo is only (as far as I know) developed  
to be applicable to Affy and Nimblegen arrays.

The designers of that package have taken a comprehensive approach to  
their design where they construct data structures having a ton of  
information about the chip.

In principle you are right: most of the info (I am not certain about  
all, because I am not fully up to date) could be constructed as you  
say, but since they are only really trying to design for Affy and  
Nimblegen I assume they are using standard files from these  
manufacturers.

Kasper

> thank you very much,
> ido
>
> *eg.:
> file1:
> oligo name, sequence, gene name (for grouping multiple oligos)
> file2: annotation
> gene or oligo name (if not grouped), annotations....
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor