[BioC] GC-content sensitive normalization of Affymetrix tiling arrays for ChIP-chip

Fri Jul 18 11:32:42 CEST 2008

Dear Wolfgang,

thank you for your reply! 
My goal is to compare my own ChIP-chip data (Nimblegen tiling) with some
other ChIP-chip data (created on Affymetrix tiling). I normalized my data
with vsn and got some nice signal-to-noise ratios (visual inspection,
replicates show same trend). When I normalize with other algorithms (loess,
quantile, Tukey-biweight) I get a similar output (based on visual inspection
and correlation among them). 

Now, I normalized the Affymetrix data with vsn and got some terrible
signal-to-noise ratios. One possible explanation might be the shorter probe
sequence of the Affy probes compared to the Nimblegen probes. Fluorescence
signals of shorter probes are more sensitive to the underlying sequence (in
particular GC-content). Because vsn does not account for the GC-content I
reasoned to try to adjust for it (therefore, I thought about using GCRMA). 

I will try to use the normalizeByReference function and report back when it
works.

Thanks again!

Best wishes, 
Christian

-----Original Message-----
From: Wolfgang Huber [mailto:huber at ebi.ac.uk] 
Sent: Friday, July 11, 2008 12:55 AM
To: Christian Feller
Cc: 'Sean Davis'; bourgon at ebi.ac.uk; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix tiling
arrays for ChIP-chip

Dear Christian,

few points:

- afaIu the background correction method of GC-RMA does not make use of 
probe sets, it works on individual probes. Probe sets only come into 
play later, for the expression estimate. But getting it to work for your 
use case may be a hard problem (has anyone on the list managed?)

  - vsn2 does not do probe-sequence specific adjustments, so I am not 
sure why it was mentioned in this context.

- the choice of language should be secondary to these criteria: quality 
of the underlying science and of the implementation.

- you say "how can I take into accound (sic) the GC-effect of single 
probes", but would it make sense to take a step back and tell us why you 
want to do that and what you want to achieve? Perhaps your answer is 
somewhere else.

- the normalizeByReference function in the tilingArray package offers a 
method to do probe(sequence)-specific background correction for 
Affymetrix tiling array data, and is described in a paper [1], but I 
have only used it on RNA expression data, not on ChIP, so porting it to 
that application would need some care.

[1] http://bioinformatics.oxfordjournals.org/cgi/reprint/22/16/1963.pdf

  Best wishes
	Wolfgang

Christian Feller wrote:
> Hi Sean,
> 
> Thank you for your quick response! We successfully used MAT under Python
for a dataset with 3 control arrays (hybridized with input) and 3 IP arrays
(all biological replicates). In comparison with vsn2, probe standardization
via MAT significantly increased the signal-to-noise ratio. However, we have
still some doubts about the reliability of those results since the raw data
seem to be very noisy, and the correlation of the biological replicates is
not very strong.
>  
> Thanks again!
> 
> Best
> Christian
> 
> -----Original Message-----
> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of Sean
Davis
> Sent: Wednesday, July 09, 2008 2:04 AM
> To: Christian Feller
> Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk
> Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix
tiling arrays for ChIP-chip
> 
> On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller
> <feller.christian at gmail.com> wrote:
>> Dear Richard Bourgon and list,
>>
>> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays (GeneChip
>> Drosophila Tiling 1.0R Array).
>> My question is how can I take into accound the GC-effect of single probes
if
>> I do not have expression sets (due to the nature of a tiling array)? We
had
>> the idea of taking a fixed window size, defining the probes within them
as a
>> "probeset", and using GCRMA for background correction/normalization. In
>> addition, can we use this configuration (normalization via GCRMA) for
>> profiles with broad ChIP-enriched regions (as it is the case for many
>> histone modifications).
>>
>> If there are some additional advice especially for the pre-processing
steps
>> I would be very happy!
>> Until now, we do the normalization using vsn2.
> 
> Hi, Christian.  Do you have the input DNA from which you are going to
> form a ratio, or are you attempting to do a single-channel analysis?
> If the latter, then you might look at MAT from Shirley Liu's group.  I
> don't think it is available for R, but the algorithm could probably be
> coded in R relatively easily.  There are likely other solutions.
> 
> Sean