[BioC] General question about library files

John O. Woods bamboowarrior at gmail.com
Fri Aug 15 22:12:08 CEST 2008


Hi everyone,

This is more of a general question. I'm fairly new to array analysis
(jumping right into the deep end here, looking at whole-genome tiling
arrays), and I'm having trouble sorting out in my head exactly what
data is stored in each Affy filetype.

It seems obvious that the CEL files contain the raw intensities from
the arrays themselves. Still, I'm not sure how these CELs are
organized--is it one CEL per chip? How do I know which metadata files
match with a specific CEL?

I also see that the BPMAP files contain design information for the
arrays. What I'm less clear on is why these have genome builds in the
names. For example, I got NCBIv36 bpmaps from Harvard, but Affy makes
an earlier build available (v34, I think). The probes are, of course,
the same (right?). Thus, does it matter to Bioconductor which build
I'm using?

I'm much less clear on CIFs and CDFs. How do these differ, and what
information do they contain? Affymetrix provides only very vague
descriptions on its website: "The CDF file describes the layout for an
Affymetrix GeneChip array." Gee, thanks. How does that differ from a
BPMAP? Why do the makePdInfoBuilder code samples use CIFs instead of
CDFs?

I've been looking for a good resource to help me get a handle on this
stuff. I see lots of tutorials and stuff for analyzing microarrays,
but little for tiling arrays (yay cutting edge). Anyone have any
pointers?

Thanks so much for the help.

Cheers,
John Woods



More information about the Bioconductor mailing list