[BioC] Filtering pmSequence based on probe target level for HTA 2.0 arrays
James W. MacDonald
jmacdon at uw.edu
Wed Aug 20 18:48:26 CEST 2014
Hi Steve,
It looks like pmSequence() for HTAFeatureSet objects dispatches on the
FeatureSet class:
> showMethods(pmSequence, class="FeatureSet", includeDefs = TRUE)
Function: pmSequence (package oligo)
object="FeatureSet"
function (object, ...)
{
.local <- function (object)
{
pmSequence(getPlatformDesign(object))
}
.local(object, ...)
}
which doesn't allow for a target argument. I haven't looked closer to see
why the dispatch is off. But it appears it should use stArrayDBPDInfo class:
> showMethods(pmSequence)
Function: pmSequence (package oligo)
object="AffyGenePDInfo"
object="AffyHTAPDInfo"
(inherited from: object="stArrayDBPDInfo")
object="AffySNPPDInfo"
object="DBPDInfo"
object="ExonFeatureSet"
object="FeatureSet"
object="GeneFeatureSet"
object="HTAFeatureSet"
(inherited from: object="FeatureSet")
object="stArrayDBPDInfo"
Which we can force by doing something like
z <- pmSequence(getPD(dat), target="probeset")
where 'dat' is a HTAFeatureSet. But we still get more probe sequences than
I would expect:
> pmid1 <- pmindex(dat, target="core")
> pmid2 <- pmindex(dat, target="probeset")
> length(pmid1)
[1] 6058440
> length(pmid2)
[1] 7576209
But since both pmid1 and pmid2 are ordered, I think you should be able to
get the pmSequences for just the probes that will be summarized at the
'core' level by subsetting:
> z.core <- z[pmid2 %in% pmid1,]
> z.core
A DNAStringSet instance of length 6056075
width seq
[1] 25 GATTAATCTTAAATCAGGATGATCC
[2] 25 CAAAATCTAAACCCGGACTGTACCT
[3] 25 CACACTATTCACACCCGCACCGAAG
[4] 25 CCGTACCTTTCAAGGTCGGCCAAGC
[5] 25 ACCCCTTGACTAAGGACGGTTGTTG
... ... ...
[6056071] 25 TCACCGTGTGTCGACGCCGGACACA
[6056072] 25 AGGTTCCTGGGACCTCGTGAGTACA
[6056073] 25 GACCCAGAGTGTAGCTCGACGACCT
[6056074] 25 ACCACAGGTACGACACTACTAAGGA
[6056075] 25 TGGCCTTCCGTGCATATCTGCACCT
Best,
Jim
On Wed, Aug 20, 2014 at 10:55 AM, Steve Piccolo <
stephen.piccolo at hsc.utah.edu> wrote:
> List members,
>
> I am working with some Affymetrix HTA 2.0 arrays. I have installed the
> draft annotation package described here:
> http://grokbase.com/t/r/bioconductor/1428394w2d/bioc-draft-support-for-hta-
> 2-0-with-oligo
>
> I am using the following commands from the oligo package to extract
> intensity values and PM sequences via the oligo package. However, I am
> running into a problem because the oligo::pmSequence function doesn't
> allow me to specify a target probe type for these arrays. By default
> oligo::pm() uses the "core" probes, whereas oligo::pmSequence only allows
> me to use the "probeset" probes. In contrast, for the ST arrays, I am able
> to do this.
>
> affyExpressionFS <- read.celfiles(celFilePath)
> pint = oligo::pm(affyExpressionFS, target="core")
>
> pmSeq = oligo::pmSequence(affyExpressionFS, target="core")
>
>
>
> Below is the error message I get.
>
> Loading required package: pd.hta.2.0
> Loading required package: RSQLite
> Loading required package: DBI
> Platform design info loaded.
> Reading in : testInputData/HTA2.CEL.gz
> Error in { : task 1 failed - "unused argument (target = "probeset")"
>
> Below is my session info. Any help would be appreciated.
>
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel methods stats graphics grDevices utils datasets
> [8] base
>
> other attached packages:
> [1] pd.hta.2.0_3.8.0 RSQLite_0.11.4 DBI_0.2-7
> [4] GEOquery_2.30.1 sva_3.10.0 mgcv_1.8-2
> [7] nlme_3.1-117 corpcor_1.6.6 foreach_1.4.2
> [10] oligo_1.28.2 Biostrings_2.32.1 XVector_0.4.0
> [13] IRanges_1.22.10 Biobase_2.24.0 oligoClasses_1.26.0
> [16] BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] affxparser_1.36.0 affyio_1.32.0 BiocInstaller_1.14.2
> [4] bit_1.1-12 codetools_0.2-8 compiler_3.1.0
> [7] ff_2.2-13 GenomeInfoDb_1.0.2 GenomicRanges_1.16.4
> [10] grid_3.1.0 iterators_1.0.7 lattice_0.20-29
> [13] Matrix_1.1-4 preprocessCore_1.26.1 RCurl_1.95-4.3
> [16] splines_3.1.0 stats4_3.1.0 XML_3.98-1.1
> [19] zlibbioc_1.10.0
>
>
>
>
> Regards,
> -Steve
>
> -‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹
> Stephen Piccolo, Ph.D.
> Postdoctoral Research Associate
>
> Affiliations:
> Department of Pharmacology and Toxicology, University of Utah
> Division of Computational Biomedicine, Boston University School of
> Medicine
> ‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list