[BioC] Using ReadAffy with custom CDFs on tiling array data

Mon Jul 28 16:52:20 CEST 2008

Hi Steve,

Steve Lianoglou wrote:
> Hi Naira,
>
> On Jul 28, 2008, at 9:57 AM, Naira Naouar wrote:
>
>> Personally, I have been working on Arabidopsis Thaliana 1.0R tiling 
>> array and I have produced my own CDF for this array. The way I did it 
>> is explained here: 
>> http://wiki.fhcrc.org/bioc/DetailedScheduleTentative?action=AttachFile&do=get&target=Lightning-Naouar.pdf 
>>
>
> Thanks for the slides! I've actually done something very similar for 
> the Drosophila 1.0R tiling array in terms of realigning probes and 
> annotating as exon/ambiguous-exon (ie, an exon in only 1 isoform of a 
> transcript)/intron/intergenic/etc. The only question I have is what 
> did you then use to get it back into the expected CDF format for 
> seamless use in Bioconductor?
>
> Sorry if I'm missing something, but I haven't come across a way to 
> create a custom cdf w/o affy's bpmap and cif files ... did you create 
> your own bpmap from your annotations (or something)?
>
Actually, I have created a CDF package for Arabidopsis 1.0R Tiling array 
that is available here:
ftp://ftp.psb.ugent.be/pub/nanao/athtiling1.0rcdf.tar.gz

doing the following:

    1. For each probe that I selected for a specific gene (AGI code), I 
kept track of the PM and MM 'xy' position on the array (via the bpmap 
file provided by Affymetrix for the tiling array).

    2. I created 2 functions to convert the 'xy' positions on the array 
to the 'i' positions that will be on your CDF. The basic functions are 
the following:

xy2i=function(x,y) {y*DIM+x+1}
i2xy = function(i) {r=cbind((i-1)%%DIM,(i-1)%/%DIM); 
colnames(r)=c('x','y'); return(r)}

where DIM corresponds to the dimension of the tiling array.

    3. Then, I created an environment containing the PM and MM positions 
for each gene (like in a normal CDF). You should have an environment 
(ex: athtiling1.0rcdf)  where the labels are the names of the genes and 
the content is a matrix with 'i' positions of the PM and MM indexes for 
each. (the i positions are simply calculated with the previous xy2i 
function.

Something like:

## Create environment
myverynicecdf = new.env()

## For each gene create a matrix of 2 columns (pm, mm) and x rows 
(corresponding to the i position on the array) and
matrix_PM_MM = blabla

assign(GENE_NAME, matrix_PM_MM,envir=myverynicecdf)

In the end, you should end up with the following:
## Exemple

 > get("AT1G01020",athtiling1.0rcdf)
           pm      mm
 [1,] 3984824 3987384
 [2,] 1022692 1025252
 ...
[10,]  511312  513872
[11,] 5051196 5053756

    4. I saved this environment as a package.

The code should resembles the following:

package.skeleton(name = "myverynicecdf", list = c("myverynicecdf", 
"xy2i", "i2xy"), path = "my/path/")

There are 2/3 changes to do in the package (folder and files) created 
that are basically the normal things for R. (If you have any questions 
about that I can also help).

Then, you build your package and basically you have your drosophila CDF 
library that can be used in R :)

I hope I am clear but you can always ask me questions about this.

Naira

-- 
==================================================================
Naira Naouar 

Tel:+32 (0)9 331 38 63
VIB Department of Plant Systems Biology, Ghent University
Technologiepark 927, 9052 Gent, BELGIUM
nanao at psb.ugent.be                         http://www.psb.ugent.be