[BioC] how to import Agilent CGH custom design xml file into limma

Wed Oct 20 17:26:24 CEST 2010

Hello,

I am trying to analyse my two-color mouse aCGH data on Agilent platform. I have custom-designed 1M probe CGH arrays for which I have a xml file that defines the arrays' design (layout etc.). My goal (for now) is to identify CNVs in the samples. I am trying to use snapCGH to analyse the samples, but I stumble upon some basic preprocessing steps.

I can read in the Feature Extraction files with no problem (following almost literally snapCGH user guide, just for single sample):

> RG1<-read.maimages(file="sample1.txt",source="agilent")
Read sample1.txt
> str(RG1)
Formal class 'RGList' [package "limma"] with 1 slots
  ..@ .Data:List of 8
  .. ..$ : num [1:974016, 1] 297 33.6 35.2 145.7 333.7 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : chr "sample1"
  .. ..$ : num [1:974016, 1] 25 25 25 25 25 25 25 25 25 25 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : chr "sample1"
  .. ..$ : num [1:974016, 1] 333.8 78.2 54.7 111.8 170.5 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : chr "sample1"
  .. ..$ : num [1:974016, 1] 52 52 52 53 53 53 53 53 53 52 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : chr "sample1"
  .. ..$ :'data.frame':	1 obs. of  1 variable:
  .. .. ..$ FileName: chr "sample1"
  .. ..$ :'data.frame':	974016 obs. of  9 variables:
  .. .. ..$ Row           : int [1:974016] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ Col           : int [1:974016] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. ..$ Start         : int [1:974016] 0 0 0 0 0 0 0 0 0 0 ...
  .. .. ..$ Sequence      : chr [1:974016] "" "" "" "CCCATTCACTAATCTACATATTATCTCCATCCAACAAAAATTTCTTTCAGTAAGGTGTGG" ...
  .. .. ..$ ProbeUID      : int [1:974016] 0 1 1 3 5 7 9 11 13 15 ...
  .. .. ..$ ControlType   : int [1:974016] 1 1 1 0 0 0 0 0 0 0 ...
  .. .. ..$ ProbeName     : chr [1:974016] "MmCGHBrightCorner" "DarkCorner" "DarkCorner" "A_67_P21917383" ...
  .. .. ..$ GeneName      : chr [1:974016] "MmCGHBrightCorner" "DarkCorner" "DarkCorner" "Msn" ...
  .. .. ..$ SystematicName: chr [1:974016] "MmCGHBrightCorner" "DarkCorner" "DarkCorner" "chrX:93323379-93323438" ...
  .. ..$ : chr "agilent"
  .. ..$ :List of 4
  .. .. ..$ ngrid.r: num 1
  .. .. ..$ ngrid.c: num 1
  .. .. ..$ nspot.r: int 1068
  .. .. ..$ nspot.c: int 912
> RG1$design<- -1 # I have flipped data where red is the reference
> RG2<-backgroundCorrect(RG1,method="normexp")
Array 1 corrected
Array 1 corrected
> MA<-normalizeWithinArrays(RG2,method="loess")

And after this command I think I need to supply the design file to limma/snapCGH so that it knows what to do with processCGH or genomePlot, because this happens during the next steps:

> MA2<-processCGH(MA, method.of.averaging=mean,ID="ProbeUID")
Error in order(MA$genes$Chr, MA$genes$Position) : 
  argument 1 is not a vector

> genomePlot(MA,array=1)
Error in order(datainfo$Chr, datainfo$Position) : 
  argument 1 is not a vector

I am shooting in the dark with the ProbeUID field here, as this should be the unique probe identifier, but it's not where the problem is. I need position of genes in the genome, and this is in the design file. I think I should use the readPositionalInfo to relay this information to limma/snapCGH, but it only accepts objects of class RGList or MAList, and my design is an xml file - I don't know how to parse the file...

I would be very grateful for any clarification on how to proceed with the analysis.

cheers
jarek

> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] snapCGH_1.20.0  DNAcopy_1.24.0  limma_3.6.0     aCGH_1.28.0     multtest_2.6.0  Biobase_2.10.0 
[7] survival_2.35-8 cluster_1.13.1  GLAD_2.12.0    

loaded via a namespace (and not attached):
 [1] affy_1.28.0           affyio_1.18.0         annotate_1.28.0       AnnotationDbi_1.12.0 
 [5] DBI_0.2-5             genefilter_1.32.0     grid_2.12.0           lattice_0.19-13      
 [9] MASS_7.3-8            preprocessCore_1.12.0 RColorBrewer_1.0-2    RSQLite_0.9-2        
[13] strucchange_1.4-2     tilingArray_1.28.0    tools_2.12.0          vsn_3.18.0           
[17] xtable_1.5-6  

-- 
  Jarek Bryk | www.evolbio.mpg.de/~bryk
  Max Planck Institute for Evolutionary Biology 
  August Thienemann Str. 2 | 24306 Plön, Germany 
  tel. +49 4522 763 287 | bryk at evolbio.mpg.de