[BioC] matchPDict with mismatches allowed appears to drop names

Tue Aug 2 17:45:31 CEST 2011

The short answer for the fuzzy matching case appears to be that, even
though your names are maintained through most of the function

	matchPDict.R:.match.MTB_PDict,

and in fact they are still present in the list 'ans_compons', they get
dropped in this call:

	ans <- ByPos_MIndex.combine(ans_compons)

By contrast, for exact matching, the function

	matchPDict.R:.match.TB_PDict

returns your names here:

	new("ByPos_MIndex", width0=width(pdict), NAMES=names(pdict),  
ends=C_ans)

So my naive guess is that

	MIndex-class.R:ByPos_MIndex.combine

should do that also, something like

	ans_names <-  mi_list[[1]]@NAMES
	new("ByPos_MIndex", width0=ans_width0, ends=ans_ends, NAMES=ans_names)

On Aug 2, 2011, at 5:24 AM, Ian Henry wrote:

> Hi,
>
> I have a question regarding the inheritance of the names attribute  
> when using matchPDict.
>
> If I use matchPDict as follows:
>
> #Get transcript information
> > hg19txdb <- makeTranscriptDbFromUCSC(genome = "hg19", tablename =  
> "refGene")
> > hg19_tx <- extractTranscriptsFromGenome(Hsapiens, hg19txdb)
>
> #Create DNAStringSet with names associated with each probe
> > probeset <- DNAStringSet(probelist$sequence)
> > names(probeset)<-probelist$probenames
>
> #Create PDict object and match against human transcript 14 (I know  
> it should match)
> > ps_pdict<-PDict(probeset)
> > txmatches <- matchPDict(ps_pdict, hg19_tx[[14]])
>
> this compares the probes in ps_pdict to transcript 14 in hg19 and  
> gives:
> >unlist(txmatches):
>
>     start end width           names
> [1]   749 773    25  HW:6
> [2]   569 593    25 HW:16
> [3]   804 828    25 HW:26
> [4]   757 781    25 HW:36
>
> which works :)
>
> However, if I search allowing for mismatches then the names appear  
> to be lost:
>
> > ps_pdict1<-PDict(probeset, max.mismatch=1)
> > txmatches1 <- matchPDict(ps_pdict1, hg19_tx[[14]],  
> max.mismatch=1, min.mismatch=0)
> > unlist(txmatches1)
>
> IRanges of length 4
>     start end width
> [1]   749 773    25
> [2]   569 593    25
> [3]   804 828    25
> [4]   757 781    25
>
> The result of matchPDict is a MIndex object that I named txmatches  
> with exact matches, and txmatches1 with 1 mismatch
> > names(txmatches)                #gives character vector  
> containing probe names
> > names(txmatches1)              #returns NULL
>
> So it appears the names are not inherited.  I tried to added them  
> manually to my MIndex object
> >names(txmatches1)<-names(probeset)
>
> but I get Error:
> attempt to modify the names of a ByPos_MIndex instance
>
> Therefore I'm not sure how to keep my probe names associated with  
> the Transcript match, which is important for inexact matching.
>
> Any help would be greatly appreciated,
>
> Thanks,
>
> Ian
>
>
> >sessionInfo()
>
> R version 2.13.0 beta (2011-03-31 r55221)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] C/UTF-8/C/C/C/C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] plyr_1.5.2                          
> BSgenome.Hsapiens.UCSC.hg19_1.3.17
> [3] BSgenome_1.19.5                    Biostrings_2.19.17
> [5] GenomicFeatures_1.3.15             GenomicRanges_1.3.31
> [7] IRanges_1.9.28
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.11.10     DBI_0.2-5           RCurl_1.5-0
> [4] RSQLite_0.9-4       XML_3.2-0           biomaRt_2.7.1
> [7] rtracklayer_1.11.12 tools_2.13.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor