[BioC] Biostrings - pairwiseAlignment trimmed pattern returned

Valerie Obenchain vobencha at fhcrc.org
Tue Feb 21 06:53:29 CET 2012


Hi Erik,

This behavior can be explained by altering the gapOpening and
gapExtension arguments. You may want to look up "affine gap penalty" and
look at paper referenced on the ?pairwiseAlignment man page.

By default gapOpening=-10 and gapExtension=-2. Using the defaults we
match your example below,

    pairwiseAlignment("ACTGACTGACTGACTG", "AAGAAGAGTTATGGGAGTAACTGACC",
gapOpening=-10, gapExtension=-2)

If we don't penalize the presence or extension of gaps we get an
alignment that includes enough gaps to match the sequences. It is not
recommended to not penalize gaps, this is for the sake of example.

    pairwiseAlignment("ACTGACTGACTGACTG", "AAGAAGAGTTATGGGAGTAACTGACC",
gapOpening=0, gapExtension=0)


The same can be done with your global-local example to get an idea of
how these parameters affect the outcome. Additionally, global-local
tries to find an alignment that matches the start and the end of one or
the other sequence. This is useful for the case where one sequence is
downstream of the other and they partially overlap. The global algorithm
tries to align every residue in each sequence. You'll want to choose the
algorithm that is most appropriate for your data then modify the
arguments if necessary.


Valerie



On 02/15/2012 02:25 PM, Erik Wright wrote:
> Hello,
>
> I have been using the Biostrings function "pairwiseAlignment" lately to perform sequence alignments.  I have discovered that with difficult alignments it sometimes returns incomplete patterns.  For example:
>
> pairwiseAlignment("ACTGACTGACTGACTG","AAGAAGAGTTATGGGAGTAACTGACC")
> Global PairwiseAlignedFixedSubject (1 of 1)
> pattern: [1] ACT-----------GACTGACTGACT 
> subject: [1] AAGAAGAGTTATGGGAGTAACTGACC 
> score: -77.67886 
>
> As you can see, the last character of the pattern ("G") has been removed.  If I set type="global-local" this does not happen, but it does in this instance:
>
> pairwiseAlignment("ACTGACTGACTGACTG","CTGAGAGGGTGATCGGCCACATTGGG",type="global-local")
> Global-Local PairwiseAlignedFixedSubject (1 of 1)
> pattern: [2] CTGACTGACTGACTG 
> subject: [1] CTGAGAGGGTGATCG 
> score: -31.55991 
>
> In this case the first character of the pattern ("A") was removed in the alignment.
>
> Please help me to understand why this is happening, and if there might be a solution.
>
> Thanks in advance,
> Erik
>
>
> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
>
> other attached packages:
> [1] RSQLite_0.10.0    DBI_0.2-5         Biostrings_2.22.0  IRanges_1.12.2   
>
> loaded via a namespace (and not attached):
> [1] tools_2.14.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list