[BioC] converting Affy indices to x,y coordinates
Mounts, William
Bill.Mounts at pfizer.com
Wed Feb 16 03:21:56 CET 2011
>From the Affymetrix documentation, the following are available for each cell (probe) in the cdf file.
Cell information, repeated for each cell in the block:
Atom number - integer
X coordinate - unsigned short
Y coordinate - unsigned short
Index position (relative to sequence for CustomSeq, Genotyping, Copy Number, Polymorphic Marker, and Multichannel Marker units, for Expression units this value is the atom number) - integer
Base of probe at substitution position - char
Base of target at interrogation position - char
Length of probe sequence - unsigned short (only available in version 2 and 3)
Physical grouping of probe - unsigned short (only available in version 2 and 3)
Index position is provided and examination of various cdf files shows that index = K*y + x. Below, in point 5, you mention that "In R it is more convenient to use one-based indices instead of zero-based indices. This is taken care of by affxparser." Is this where the 1 comes from in the implementation in order to move the index values from 0-based to 1-based?
On Mon, Feb 14, 2011 at 10:24 AM, Mounts, William <Bill.Mounts at pfizer.com> wrote:
> Todd,
>
> It would appear that there is an error in affyxparser. Testing a
> number of cdf files, it appears that index = K * y + x.
I doubt that. Could you please provide complete examples illustrating the problem? Unless proven wrong, I stand firm on the claim that both the implementation and documentation to be correct. As Kasper pointed out, it may be that the documentation is confusing or ambiguous, but that is not to say it's wrong. I am happy to take suggestions on how to improve the documentation.
CLARIFICATIONS:
1. The spatial (x,y) cell coordinates are zero-based [1]. This is at least the case if you access them via Affymetrix Fusion SDK, that is,
via affxparser. I cannot claim that all CDF files in history have
had zero-based (x,y) coordinates, but it does not matter because throught the Fusion SDK they are returned as such. (Anecdotal
evidence: Browsing through several of my (ASCII and binary) CDFs, they are indeed zero-based (x,y):s.)
2. A CDF file reference the cells (probes) by their (x,y) coordinates only [2].
3. It is more convenient to access cells by their linear indices, which is why they are provided.
4. BTW, note also the last comment on that help page [1]: If you use the affxparser methods, you don't have to worry about (x,y) indices; everything is by default done using cell (probe) indices.
5. In R it is more convenient to use one-based indices instead of zero-based indices. This is taken care of by affxparser.
6. The affxparser documentation [1] clearly says that spatial (x,y) cell coordinates are zero-based indices and the linear cell indices are one-based.
7. Do not confuse (Bioconductor) CDF annotation packages/environments with (Affymetrix) CDF *files*; affxparser deals with the latter only.
I think Clarification (4) is one of the most important ones. If you stick with affxparser, you are given a well-defined self-contained and consistent access to the content of CEL and CDF files (and some other Affymetrix file types too).
REFERENCES:
[1] help("2. Cell coordinates and cell indices", package="affxparser")
[2] Section 'Affymetrix CDF Data File Format' part of 'File Formats Documentation', Affymetrix, October 2009
(http://www.affymetrix.com/partners_programs/programs/developer/fusion/index.affx?terms=no)
/Henrik
(wrote most of [1])
>
> Bill
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org
> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Todd Allen
> Sent: Monday, February 14, 2011 11:19 AM
> To: bioconductor at r-project.org
> Subject: [BioC] converting Affy indices to x,y coordinates
>
> Hello all,
>
> I have been reading the documentation portion of a package called
> "affyxparser." In the documentation there is a description of the
> formulas needed to seemlessly convert between Affymetrix probe indices
> and the cooresponding (x,y) coordinate of individual probes.
>
> Copying from the package documentation, the following information is
> most relevant:
>
> 1. index = K * y + x + 1; where K is the number of columns on the chip
> 2. y = floor ((index - 1)/K) 3. x=(index - 1) - K * y
>
> In my own work, I am processing a HGU133Plus 2 CDF file. The array
> dimensions are (1164, 1164) and if I take the index of a specific
> probe listed as 1354890, I calculate the coordinates as x = 1157 and y
> = 1163 using the formulas above.
>
> The (x,y) coordinate reported from Affy's own CDF file for this probe
> is actually x = 1158 (not 1157) and y = 1163.
>
> I am struggling to understand this discrepancy between the affyparser
> documentation and the verbatim output from Affy's own CDF file. Has
> any run into this situation before? Do you see any obvious problem or
> explanation as to what is happening.
>
> Thank you!
> Todd A
> genesplicer28 at yahoo.com
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list