[BioC] Using ReadAffy with custom CDFs on tiling array data
Naira Naouar
nanao at psb.ugent.be
Mon Jul 28 16:52:20 CEST 2008
Hi Steve,
Steve Lianoglou wrote:
> Hi Naira,
>
> On Jul 28, 2008, at 9:57 AM, Naira Naouar wrote:
>
>> Personally, I have been working on Arabidopsis Thaliana 1.0R tiling
>> array and I have produced my own CDF for this array. The way I did it
>> is explained here:
>> http://wiki.fhcrc.org/bioc/DetailedScheduleTentative?action=AttachFile&do=get&target=Lightning-Naouar.pdf
>>
>
> Thanks for the slides! I've actually done something very similar for
> the Drosophila 1.0R tiling array in terms of realigning probes and
> annotating as exon/ambiguous-exon (ie, an exon in only 1 isoform of a
> transcript)/intron/intergenic/etc. The only question I have is what
> did you then use to get it back into the expected CDF format for
> seamless use in Bioconductor?
>
> Sorry if I'm missing something, but I haven't come across a way to
> create a custom cdf w/o affy's bpmap and cif files ... did you create
> your own bpmap from your annotations (or something)?
>
Actually, I have created a CDF package for Arabidopsis 1.0R Tiling array
that is available here:
ftp://ftp.psb.ugent.be/pub/nanao/athtiling1.0rcdf.tar.gz
doing the following:
1. For each probe that I selected for a specific gene (AGI code), I
kept track of the PM and MM 'xy' position on the array (via the bpmap
file provided by Affymetrix for the tiling array).
2. I created 2 functions to convert the 'xy' positions on the array
to the 'i' positions that will be on your CDF. The basic functions are
the following:
xy2i=function(x,y) {y*DIM+x+1}
i2xy = function(i) {r=cbind((i-1)%%DIM,(i-1)%/%DIM);
colnames(r)=c('x','y'); return(r)}
where DIM corresponds to the dimension of the tiling array.
3. Then, I created an environment containing the PM and MM positions
for each gene (like in a normal CDF). You should have an environment
(ex: athtiling1.0rcdf) where the labels are the names of the genes and
the content is a matrix with 'i' positions of the PM and MM indexes for
each. (the i positions are simply calculated with the previous xy2i
function.
Something like:
## Create environment
myverynicecdf = new.env()
## For each gene create a matrix of 2 columns (pm, mm) and x rows
(corresponding to the i position on the array) and
matrix_PM_MM = blabla
assign(GENE_NAME, matrix_PM_MM,envir=myverynicecdf)
In the end, you should end up with the following:
## Exemple
> get("AT1G01020",athtiling1.0rcdf)
pm mm
[1,] 3984824 3987384
[2,] 1022692 1025252
...
[10,] 511312 513872
[11,] 5051196 5053756
4. I saved this environment as a package.
The code should resembles the following:
package.skeleton(name = "myverynicecdf", list = c("myverynicecdf",
"xy2i", "i2xy"), path = "my/path/")
There are 2/3 changes to do in the package (folder and files) created
that are basically the normal things for R. (If you have any questions
about that I can also help).
Then, you build your package and basically you have your drosophila CDF
library that can be used in R :)
I hope I am clear but you can always ask me questions about this.
Naira
--
==================================================================
Naira Naouar
Tel:+32 (0)9 331 38 63
VIB Department of Plant Systems Biology, Ghent University
Technologiepark 927, 9052 Gent, BELGIUM
nanao at psb.ugent.be http://www.psb.ugent.be
More information about the Bioconductor
mailing list