[BioC] matchPDict with mismatches allowed appears to drop names
Harris A. Jaffee
hj at jhu.edu
Tue Aug 2 17:45:31 CEST 2011
The short answer for the fuzzy matching case appears to be that, even
though your names are maintained through most of the function
matchPDict.R:.match.MTB_PDict,
and in fact they are still present in the list 'ans_compons', they get
dropped in this call:
ans <- ByPos_MIndex.combine(ans_compons)
By contrast, for exact matching, the function
matchPDict.R:.match.TB_PDict
returns your names here:
new("ByPos_MIndex", width0=width(pdict), NAMES=names(pdict),
ends=C_ans)
So my naive guess is that
MIndex-class.R:ByPos_MIndex.combine
should do that also, something like
ans_names <- mi_list[[1]]@NAMES
new("ByPos_MIndex", width0=ans_width0, ends=ans_ends, NAMES=ans_names)
On Aug 2, 2011, at 5:24 AM, Ian Henry wrote:
> Hi,
>
> I have a question regarding the inheritance of the names attribute
> when using matchPDict.
>
> If I use matchPDict as follows:
>
> #Get transcript information
> > hg19txdb <- makeTranscriptDbFromUCSC(genome = "hg19", tablename =
> "refGene")
> > hg19_tx <- extractTranscriptsFromGenome(Hsapiens, hg19txdb)
>
> #Create DNAStringSet with names associated with each probe
> > probeset <- DNAStringSet(probelist$sequence)
> > names(probeset)<-probelist$probenames
>
> #Create PDict object and match against human transcript 14 (I know
> it should match)
> > ps_pdict<-PDict(probeset)
> > txmatches <- matchPDict(ps_pdict, hg19_tx[[14]])
>
> this compares the probes in ps_pdict to transcript 14 in hg19 and
> gives:
> >unlist(txmatches):
>
> start end width names
> [1] 749 773 25 HW:6
> [2] 569 593 25 HW:16
> [3] 804 828 25 HW:26
> [4] 757 781 25 HW:36
>
> which works :)
>
> However, if I search allowing for mismatches then the names appear
> to be lost:
>
> > ps_pdict1<-PDict(probeset, max.mismatch=1)
> > txmatches1 <- matchPDict(ps_pdict1, hg19_tx[[14]],
> max.mismatch=1, min.mismatch=0)
> > unlist(txmatches1)
>
> IRanges of length 4
> start end width
> [1] 749 773 25
> [2] 569 593 25
> [3] 804 828 25
> [4] 757 781 25
>
> The result of matchPDict is a MIndex object that I named txmatches
> with exact matches, and txmatches1 with 1 mismatch
> > names(txmatches) #gives character vector
> containing probe names
> > names(txmatches1) #returns NULL
>
> So it appears the names are not inherited. I tried to added them
> manually to my MIndex object
> >names(txmatches1)<-names(probeset)
>
> but I get Error:
> attempt to modify the names of a ByPos_MIndex instance
>
> Therefore I'm not sure how to keep my probe names associated with
> the Transcript match, which is important for inexact matching.
>
> Any help would be greatly appreciated,
>
> Thanks,
>
> Ian
>
>
> >sessionInfo()
>
> R version 2.13.0 beta (2011-03-31 r55221)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] C/UTF-8/C/C/C/C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] plyr_1.5.2
> BSgenome.Hsapiens.UCSC.hg19_1.3.17
> [3] BSgenome_1.19.5 Biostrings_2.19.17
> [5] GenomicFeatures_1.3.15 GenomicRanges_1.3.31
> [7] IRanges_1.9.28
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.11.10 DBI_0.2-5 RCurl_1.5-0
> [4] RSQLite_0.9-4 XML_3.2-0 biomaRt_2.7.1
> [7] rtracklayer_1.11.12 tools_2.13.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/
> gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list