[BioC] makePdInfoPackage for Primeview arrays
Benilton Carvalho
beniltoncarvalho at gmail.com
Tue Jun 18 20:05:52 CEST 2013
Unfortunately, I cannot provide a quick fix for this. The reason is
that pdInfoBuilder, for expression arrays, relies on the fact that one
probe belongs to only one probeset. And this is not true for primeview
chips.
For example, the probe with chip coordinates X=135 and Y=147 is shared
by two probesets (11715100_at and 11715102_x_at)... and this happens
for thousands of other probesets.
Before changing the code, I want to make sure I fully understand the
background for this chip and why duplicity happens... and this may
take a while...
Will get back to the list once I have news on this front,
b
2013/6/18 Max Kauer <maximilian.kauer at ccri.at>:
> Hi,
>
> I am trying to make a pd.info package for the Affy Primeview array, but I
> get an error.
>
> Thanks for any help!
>
> Cheers,
>
> Max
>
>
>
>
>
> This is my code:
>
>
>
> library(pdInfoBuilder)
>
> cdf <- list.files( pathAnnotPr, pattern = ".cdf", full.names = TRUE )
>
> cel <- list.files( pathC, pattern = ".CEL", full.names = TRUE )[1] # take
> first array
>
> tab <- list.files(pathAnnotPr, pattern = "_tab", full.names = TRUE)
>
>
>
> seed <- new("AffyExpressionPDInfoPkgSeed",
>
> cdfFile = cdf, celFile = cel,
>
> tabSeqFile = tab, author = "xx",
>
> email = "xx",
>
> biocViews = "AnnotationData",
>
> genomebuild = "hg19",
>
> organism = "Human", species = "Homo Sapiens",
>
> url = "xx"
>
> )
>
> makePdInfoPackage( seed, destDir = "." )
>
>
>
>
>
>
>
> Which produces this output/error (although a pd.primeview directory is
> created):
>
>
>
> ============================================================================
> ====
>
> Building annotation package for Affymetrix Expression array
>
> CDF...............: PrimeView.cdf
>
> CEL...............: MJ_05042013_TAS_10_PrimeView.CEL
>
> Sequence TAB-Delim: PrimeView.probe_tab
>
> ============================================================================
> ====
>
> Parsing file: PrimeView.cdf... OK
>
> Parsing file: MJ_05042013_TAS_10_PrimeView.CEL... OK
>
> Parsing file: PrimeView.probe_tab... OK
>
> Getting information for featureSet table... OK
>
> Getting information for pm/mm feature tables...
>
> OK
>
> Combining probe information with sequence information... OK
>
> Getting PM probes and sequences... OK
>
> Done parsing.
>
> Creating package in ./pd.primeview
>
> Inserting 49395 rows into table featureSet... OK
>
> Inserting 609663 rows into table pmfeature... Error in
> sqliteExecStatement(con, statement, bind.data) :
>
> RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be
> unique)
>
> In addition: Warning messages:
>
> 1: In parseCdfCelProbe(object at cdfFile, object at celFile, object at tabSeqFile, :
>
> Probe sequences were not found for all PM probes. These probes will be
> removed from the pmSequence object.
>
> 2: In parseCdfCelProbe(object at cdfFile, object at celFile, object at tabSeqFile, :
>
> Probe sequences were not found for all MM probes. These probes will be
> removed from the mmSequence object.
>
>
>
>
>
>
>
>> sessionInfo()
>
> R version 3.0.0 (2013-04-03)
>
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
>
>
> locale:
>
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>
> [7] LC_PAPER=C LC_NAME=C
>
> [9] LC_ADDRESS=C LC_TELEPHONE=C
>
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>
>
> attached base packages:
>
> [1] parallel stats graphics grDevices utils datasets methods
>
> [8] base
>
>
>
> other attached packages:
>
> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0
>
> [4] affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7
>
> [7] Biobase_2.20.0 BiocGenerics_0.6.0
>
>
>
> loaded via a namespace (and not attached):
>
> [1] affyio_1.28.0 BiocInstaller_1.10.2 Biostrings_2.28.0
>
> [4] bit_1.1-10 codetools_0.2-8 ff_2.2-11
>
> [7] foreach_1.4.1 GenomicRanges_1.12.4 IRanges_1.18.1
>
> [10] iterators_1.0.6 preprocessCore_1.22.0 splines_3.0.0
>
> [13] stats4_3.0.0 zlibbioc_1.6.0
>
>>
>
>
>
> Max Kauer
>
> CHILDREN'S CANCER RESEARCH INSTITUTE
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list