[BioC] matchPDict with mismatches allowed appears to drop names
Ian Henry
henry at mpi-cbg.de
Tue Aug 2 11:24:57 CEST 2011
Hi,
I have a question regarding the inheritance of the names attribute
when using matchPDict.
If I use matchPDict as follows:
#Get transcript information
> hg19txdb <- makeTranscriptDbFromUCSC(genome = "hg19", tablename =
"refGene")
> hg19_tx <- extractTranscriptsFromGenome(Hsapiens, hg19txdb)
#Create DNAStringSet with names associated with each probe
> probeset <- DNAStringSet(probelist$sequence)
> names(probeset)<-probelist$probenames
#Create PDict object and match against human transcript 14 (I know it
should match)
> ps_pdict<-PDict(probeset)
> txmatches <- matchPDict(ps_pdict, hg19_tx[[14]])
this compares the probes in ps_pdict to transcript 14 in hg19 and gives:
>unlist(txmatches):
start end width names
[1] 749 773 25 HW:6
[2] 569 593 25 HW:16
[3] 804 828 25 HW:26
[4] 757 781 25 HW:36
which works :)
However, if I search allowing for mismatches then the names appear to
be lost:
> ps_pdict1<-PDict(probeset, max.mismatch=1)
> txmatches1 <- matchPDict(ps_pdict1, hg19_tx[[14]], max.mismatch=1,
min.mismatch=0)
> unlist(txmatches1)
IRanges of length 4
start end width
[1] 749 773 25
[2] 569 593 25
[3] 804 828 25
[4] 757 781 25
The result of matchPDict is a MIndex object that I named txmatches
with exact matches, and txmatches1 with 1 mismatch
> names(txmatches) #gives character vector containing
probe names
> names(txmatches1) #returns NULL
So it appears the names are not inherited. I tried to added them
manually to my MIndex object
>names(txmatches1)<-names(probeset)
but I get Error:
attempt to modify the names of a ByPos_MIndex instance
Therefore I'm not sure how to keep my probe names associated with the
Transcript match, which is important for inexact matching.
Any help would be greatly appreciated,
Thanks,
Ian
>sessionInfo()
R version 2.13.0 beta (2011-03-31 r55221)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.5.2
BSgenome.Hsapiens.UCSC.hg19_1.3.17
[3] BSgenome_1.19.5 Biostrings_2.19.17
[5] GenomicFeatures_1.3.15 GenomicRanges_1.3.31
[7] IRanges_1.9.28
loaded via a namespace (and not attached):
[1] Biobase_2.11.10 DBI_0.2-5 RCurl_1.5-0
[4] RSQLite_0.9-4 XML_3.2-0 biomaRt_2.7.1
[7] rtracklayer_1.11.12 tools_2.13.0
More information about the Bioconductor
mailing list