[BioC] pdInfoBuilder fails on Affy's GeneChip Human Transcriptome Array 2.0
James W. MacDonald
jmacdon at uw.edu
Thu Jan 23 20:00:54 CET 2014
This is an existing problem. See the email sent to the listserv just
hours ago, asking for an update on progress:
https://stat.ethz.ch/pipermail/bioconductor/attachments/20140123/4847d433/attachment.pl
Best,
Jim
On 1/23/2014 1:34 PM, Guilherme Rocha wrote:
> Dear all,
>
> I am trying to create the pfInfoBuilder packages for Affy's GeneChip Human
> Transcriptome Array 2.0.
>
> I am using the "original" pgf, clf, mps, and probeset.csv files from the
> library files from Affy's website (
> http://www.affymetrix.com/Auth/analysis/downloads/lf/hta/HTA-2_0/AGCC_library_installer_HTA-2_0.zip
> ).
>
> I was able to read the probeset.csv file using plain vanilla read.csv.
> Thus, it is likely the solution given to a similar problem with
> Arabidopsis chips does not apply ("pdInfoBuilder fails on the new
> Arabidopsis Gene ST 1.0 & 1.1 arrays",
> https://stat.ethz.ch/pipermail/bioconductor/2012-March/044231.html)
>
> Details are shown below.
>
> Any help greatly appreciated.
>
> Regards,
>
> Guilherme Rocha
>
>
> ------------------------------------------------------------------------------------------------------------
> R Code and output:
>
>> library(pdInfoBuilder)
> Loading required package: Biobase
> Loading required package: BiocGenerics
> Loading required package: parallel
>
> Attaching package: 'BiocGenerics'
>
> The following objects are masked from 'package:parallel':
>
> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
> clusterExport, clusterMap, parApply, parCapply, parLapply,
> parLapplyLB, parRapply, parSapply, parSapplyLB
>
> The following object is masked from 'package:stats':
>
> xtabs
>
> The following objects are masked from 'package:base':
>
> Filter, Find, Map, Position, Reduce, anyDuplicated, append,
> as.data.frame, as.vector, cbind, colnames, duplicated, eval, evalq,
> get, intersect, is.unsorted, lapply, mapply, match, mget, order,
> paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rep.int,
> rownames, sapply, setdiff, sort, table, tapply, union, unique,
> unlist
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material; view with
> 'browseVignettes()'. To cite Bioconductor, see
> 'citation("Biobase")', and for packages 'citation("pkgname")'.
>
> Loading required package: RSQLite
> Loading required package: DBI
> Loading required package: affxparser
> Loading required package: oligo
> Loading required package: oligoClasses
> Welcome to oligoClasses version 1.24.0
> ================================================================================
> Welcome to oligo version 1.26.0
> ================================================================================
>
> Attaching package: 'oligo'
>
> The following object is masked from 'package:BiocGenerics':
>
> normalize
>
>> base_dir = "./"
>>
>> pgf = paste(base_dir, "/HTA-2_0.r1.pgf", sep="")
>> clf = paste(base_dir, "/HTA-2_0.r1.clf", sep="")
>> prob = paste(base_dir, "/HTA-2_0.na33.hg19.probeset.csv", sep="")
>> core_mps = paste(base_dir, "/HTA-2_0.r1.Psrs.mps", sep="")
>> extended_mps = paste(base_dir, "/HTA-2_0.r1.Psrs.mps", sep="")
>> full_mps = paste(base_dir, "/HTA-2_0.r1.Psrs.mps", sep="")
>>
>> test_csv = read.csv(paste(base_dir,
> "/HTA-2_0.na33.hg19.probeset.csv", sep=""), skip=14, header=T)
>> seed = new("AffyExonPDInfoPkgSeed",
> + pgfFile = pgf,
> + clfFile = clf,
> + probeFile = prob,
> + coreMps = core_mps,
> + extendedMps = extended_mps,
> + fullMps = full_mps,
> + author = "GR",
> + email = "anemailadress at gmail.com",
> + biocViews = "AnnotationData",
> + genomebuild = "GRCh37",
> + organism = "Human",
> + species = "Homo sapiens",
> + url = "")
>> makePdInfoPackage(seed, destDir=base_dir);
> ================================================================================
> Building annotation package for Affymetrix Exon ST Array
> PGF.........: HTA-2_0.r1.pgf
> CLF.........: HTA-2_0.r1.clf
> Probeset....: HTA-2_0.na33.hg19.probeset.csv
> Transcript..: TheTranscriptFile
> Core MPS....: HTA-2_0.r1.Psrs.mps
> Full MPS....: HTA-2_0.r1.Psrs.mps
> Extended MPS: HTA-2_0.r1.Psrs.mps
> ================================================================================
> Parsing file: HTA-2_0.r1.pgf... OK
> Parsing file: HTA-2_0.r1.clf... OK
> Creating initial table for probes... OK
> Creating dictionaries... OK
> Parsing file: HTA-2_0.na33.hg19.probeset.csv... OK
> Parsing file: HTA-2_0.r1.Psrs.mps... OK
> Parsing file: HTA-2_0.r1.Psrs.mps... OK
> Parsing file: HTA-2_0.r1.Psrs.mps... OK
> Creating package in .//pd.hta.2.0
> Inserting 850 rows into table chrom_dict... OK
> Inserting 5 rows into table level_dict... OK
> Inserting 11 rows into table type_dict... OK
> Inserting 577432 rows into table core_mps... OK
> Inserting 577432 rows into table full_mps... OK
> Inserting 577432 rows into table extended_mps... OK
> Inserting 1839617 rows into table featureSet... Error in
> sqliteExecStatement(con, statement, bind.data) :
> RS-DBI driver: (RS_SQLite_exec: could not execute: datatype mismatch)
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] pdInfoBuilder_1.26.0 oligo_1.26.0 oligoClasses_1.24.0
> [4] affxparser_1.34.0 RSQLite_0.11.4 DBI_0.2-7
> [7] Biobase_2.22.0 BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] BiocInstaller_1.12.0 Biostrings_2.30.0 GenomicRanges_1.14.1
> [4] IRanges_1.20.0 XVector_0.2.0 affyio_1.30.0
> [7] bit_1.1-10 codetools_0.2-8 ff_2.2-12
> [10] foreach_1.4.1 iterators_1.0.6 preprocessCore_1.24.0
> [13] splines_3.0.2 stats4_3.0.2 zlibbioc_1.8.0
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list