[BioC] request for simple usage of probe level normalisations
Kasper Daniel Hansen
khansen at stat.Berkeley.EDU
Mon Oct 8 20:20:08 CEST 2007
On Oct 8, 2007, at 2:30 AM, Ido M. Tamir wrote:
> Dear All,
>
> a) I don't know, if sequence based models like GCRMA, which I read
> stands
> actually for "GeneChip (tm)" not GC content, can be extended to
> other platforms.
> I am just looking at single color agilent chips and there is a gc
> content
> bias:
> log2(intensity)~gc percentage:
> Coefficients: Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.023391 0.105189 9.729 <2e-16 ***
> gcp 0.121826 0.002503 48.666 <2e-16 ***
As you should know GCRMA actually consists of 3 steps: background
correction, normalization and summarization. The last two steps are
the same as in RMA and the third step (summarization) requires the
concept of a probeset, ie. several probes targeting the same gene (or
transcript or whatever you are trying to measure). It is not clear to
me that Agilent arrays have probesets, although it of course depends
on the design.
The background correction is really the thing where GCRMA uses probe
sequence information. What the authors of GCRMA have done, is
estimate some parameters related to 25-mer oligos in a reference
experiment (well to be precise I remember it as a large pool of many
experiments). These parameters are then _postulated_ to be relevant
for all affy chips (with some justification).
As a minimum, if you want to use GCRMA on another platform you would
need to do some kind of estimation of these parameters - especially
if the other platform uses different length oligos, as Agilent does
(although I guess you could get 25mer arrays from Agilent).
Then you would need to have some kind of spike in experiment to show
that it really helps you on this other platform.
With such reference data it would not be hard to use the GCRMA
algorithm for another chip - at least the background correction part.
> b) I know one should not request/ask open source deveoplers for
> something
>
> but:
> if GCRMA is applicable to other platforms, then it would be nice if
> it could
> be used in a simple way with these other platforms, and new
> platforms for
> oligo chips are getting more and more common.
>
> I read the information from the oligo package and of the makePDpackage
> which seems to be superseeded in the future by the pdInfoBuilder.
>
> Would it be possible to make this simpler somehow? I don't know
> exactly what
> information is actually needed by the downstream analysis with
> GCRMA, but
> wouldn't it be sufficient that for the creation of a new
> environment I would
> need just 2 simple tab delimited text files*. Then one could simply
> make a
> script that converts ones own format (which are not .ndf or .cdf)
> to this
> _simple_ tab delimited format whose specification is clearly
> outlined in the
> package vignette.
>
> Maybe I am underestimating the complexity (ignoring spatial
> information on
> chip) or its already there (yes, cdf etc.. files can be faked).
The pdInfo path taken by oligo is only (as far as I know) developed
to be applicable to Affy and Nimblegen arrays.
The designers of that package have taken a comprehensive approach to
their design where they construct data structures having a ton of
information about the chip.
In principle you are right: most of the info (I am not certain about
all, because I am not fully up to date) could be constructed as you
say, but since they are only really trying to design for Affy and
Nimblegen I assume they are using standard files from these
manufacturers.
Kasper
> thank you very much,
> ido
>
> *eg.:
> file1:
> oligo name, sequence, gene name (for grouping multiple oligos)
> file2: annotation
> gene or oligo name (if not grouped), annotations....
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/
> gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list