[BioC] Using aCGH library on Affymetrix Cytogenetics 2.7M microarray data
Ryan Goosen [guest]
guest at bioconductor.org
Mon Nov 19 11:33:09 CET 2012
Dear Bioconductor mailing list,
I am in the process of trying to use your R/Bioconductor "aCGH" library to process my copy number data.
In particular, I have copy number data (log2ratios) generated from analysis of Affymetrix Cytogenetics 2.7M arrays (http://media.affymetrix.com/support/technical/datasheets/cytogenetics_research_solution.pdf), which have ~2 million copy number probes, and ~400,000 SNP probes for detecting LOH.
I have written a script in R to retrieve the ~2million copy number probe data in the form of log2ratios. These data are generated using apt-copynumber-cyto (part of Affymetrix Powertools) to produce .CYCHP.txt files. I have determined that the copy number log2ratios start from line 549 and continue for 2141465 rows in the aforementioned text files.
Original object after parsing:
> str(cnData)
'data.frame': 2141465 obs. of 23 variables:
$ ProbeSetName: Factor w/ 2141465 levels "C-00IGZ","C-00IHI",..: 580623 580624 580625 580626 580627 580628 580629 1967674 580630 580631 ...
$ Chromosome : Factor w/ 24 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Position : int 712577 713263 714145 714635 718604 750062 752757 754192 755354 760401 ...
$ 10T : num -0.268 -0.324 0.486 -0.672 -0.191 ...
$ 13T : num -1.032 -0.522 0.414 -0.552 -0.901 ...
$ 14T : num -0.917 -0.698 0.723 -1.475 -0.771 ...
$ 15T : num -0.541 -0.161 0.248 -0.529 -0.859 ...
$ 16T : num -0.469 -0.43 0.129 -0.317 -1.051 ...
$ 23T : num -0.0257 0.0107 0.2888 0.3228 0.1635 ...
$ 33T : num 0.071 0.959 0.422 -0.019 0.35 ...
$ 34T : num -0.846 -0.471 0.48 -1.141 -0.466 ...
$ 37T : num -1.014 -0.279 0.327 -0.796 -0.485 ...
$ 3T : num -0.46 -0.221 0.117 0.423 -0.266 ...
$ 41T : num -2.021 -0.7997 0.4713 -0.0937 -1.1054 ...
$ 44T : num -0.7501 -0.2017 0.0135 -1.1092 -0.356 ...
$ 4T : num 0.00255 -0.05183 0.09327 -0.2049 -0.07572 ...
$ 55T : num -0.2161 -0.5777 0.1861 -0.0936 -0.1689 ...
$ 56T : num 0.0622 0.1612 0.2907 0.3115 0.2649 ...
$ 60T : num 0.0222 0.0937 -0.1307 0.3206 -0.0847 ...
$ 61T : num 0.1707 -0.0255 0.2095 -0.0505 -0.1473 ...
$ 63T : num 0.00136 -0.01699 -0.15279 -0.2546 0.06513 ...
$ 8T : num -0.146 -0.101 0.389 -0.465 -0.357 ...
$ IT : num -0.2524 0.2645 0.7298 -0.563 -0.0915 ...
With regards to trying to use the aCGH library- I have attempted to subset my data in such a way to create a valid aCGH object through the create.aCGH() method which seems to have worked.
The R statements I used were as follows:
aCGH.object = create.aCGH(log2.ratios = cnData[4:23], clones.info = cnData[0:3])
colnames(aCGH.object$clones.info)[1] = "Clone"
colnames(aCGH.object$clones.info)[2] = "Chrom"
colnames(aCGH.object$clones.info)[3] = "kb"
aCGH.object$clones.info$Chrom = as.integer(aCGH.object$clones.info$Chrom)
The resultant object is as follows (each column in the $log2.ratios data-frame is a unique sample):
> str(aCGH.object)
List of 4
$ log2.ratios :'data.frame': 2141465 obs. of 20 variables:
..$ 10T: num [1:2141465] -0.268 -0.324 0.486 -0.672 -0.191 ...
..$ 13T: num [1:2141465] -1.032 -0.522 0.414 -0.552 -0.901 ...
..$ 14T: num [1:2141465] -0.917 -0.698 0.723 -1.475 -0.771 ...
..$ 15T: num [1:2141465] -0.541 -0.161 0.248 -0.529 -0.859 ...
..$ 16T: num [1:2141465] -0.469 -0.43 0.129 -0.317 -1.051 ...
..$ 23T: num [1:2141465] -0.0257 0.0107 0.2888 0.3228 0.1635 ...
..$ 33T: num [1:2141465] 0.071 0.959 0.422 -0.019 0.35 ...
..$ 34T: num [1:2141465] -0.846 -0.471 0.48 -1.141 -0.466 ...
..$ 37T: num [1:2141465] -1.014 -0.279 0.327 -0.796 -0.485 ...
..$ 3T : num [1:2141465] -0.46 -0.221 0.117 0.423 -0.266 ...
..$ 41T: num [1:2141465] -2.021 -0.7997 0.4713 -0.0937 -1.1054 ...
..$ 44T: num [1:2141465] -0.7501 -0.2017 0.0135 -1.1092 -0.356 ...
..$ 4T : num [1:2141465] 0.00255 -0.05183 0.09327 -0.2049 -0.07572 ...
..$ 55T: num [1:2141465] -0.2161 -0.5777 0.1861 -0.0936 -0.1689 ...
..$ 56T: num [1:2141465] 0.0622 0.1612 0.2907 0.3115 0.2649 ...
..$ 60T: num [1:2141465] 0.0222 0.0937 -0.1307 0.3206 -0.0847 ...
..$ 61T: num [1:2141465] 0.1707 -0.0255 0.2095 -0.0505 -0.1473 ...
..$ 63T: num [1:2141465] 0.00136 -0.01699 -0.15279 -0.2546 0.06513 ...
..$ 8T : num [1:2141465] -0.146 -0.101 0.389 -0.465 -0.357 ...
..$ IT : num [1:2141465] -0.2524 0.2645 0.7298 -0.563 -0.0915 ...
$ clones.info :'data.frame': 2141465 obs. of 3 variables:
..$ Clone: Factor w/ 2141465 levels "C-00IGZ","C-00IHI",..: 580623 580624 580625 580626 580627 580628 580629 1967674 580630 580631 ...
..$ Chrom: int [1:2141465] 1 1 1 1 1 1 1 1 1 1 ...
..$ kb : int [1:2141465] 712577 713263 714145 714635 718604 750062 752757 754192 755354 760401 ...
$ phenotype : NULL
I have tried to make this object resemble the structure, and data types, as reported in the aCGH vignette example data sets. The only column I see missing is the aCGH.object$clones.info$Target column. I am unsure of what the latter is meant to detail.
When I attempt to generate basic plots of my data, via: plot(aCGH.object), plotGenome(aCGH.object), or plotFreqStat(aCGH.object), then I get graphs that appear overly noisy, and in which the chromosomal markers appear not to be linked to the dataset correctly as they are all bunched towards the left-handside of the graphs. Copies of the graphs are here:
https://www.dropbox.com/s/9z3n70arvxir2iu/aCGH.cn.plot.png
https://www.dropbox.com/s/uzoont4mvratqsz/aCGH.cn.plotFreqStats.png
https://www.dropbox.com/s/12fn3lwnfosaqpc/aCGH.cn.plotGenome.png
As such, my question essentially is: Have I created the aCGH object correctly or am I missing something?
Many thanks for your time and assistance.
Yours sincerely,
Ryan
-- output of sessionInfo():
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list