[BioC] BioStrings questions

Patrick Aboyoun paboyoun at fhcrc.org
Fri Oct 17 12:33:38 CEST 2008


Raffaele,
The pairwiseAlignment function uses an O(nm) {with n and m being the 
length of the two sequences being aligned} dynamic programming algorithm 
that is designed to find an optimal alignment and as you have discovered 
isn't intended for use with a long reference sequence. Do your PSR 
sequences map nearly exactly to a location on your reference sequence 
and are these sequences of equal length? If so, see the matchPDict 
function. It matches a pattern dictionary consisting of equal length 
fragments to a reference sequence. The pseudo code looks something like:

psrPDict <- PDict(PSRDNAStringSet)
matchPDict(psrPDict, refseq)

To answer your second question, the append function should get you what 
you want:

 > append(DNAStringSet(c("AAA", "GA")), DNAStringSet(c("ACTG", "TTTACCC")))
  A DNAStringSet instance of length 4
    width seq
[1]     3 AAA
[2]     2 GA
[3]     4 ACTG
[4]     7 TTTACCC


Patrick


rcaloger wrote:
> Hi,
> In my onechannelGUI package I am developing a section related to 
> Affymetrix exon array analysis, creating few functions that allow the 
> association of exon-level Probe Selection Region (PSR) to refseq
>
> 1st question:
> I have implemented a function that blast a list of PSR sequences over 
> all refseq.
> However, I would like to know if there is any way of doing something 
> similar using the Biostring package.
> I tried the pairwiseAlignment function but it is quite slow compared 
> to blast.
>
> 2nd question:
> there is any way of merging two DNAStringSets ?
>
> Cheers
> Raffaele
>
>
>



More information about the Bioconductor mailing list