[BioC] difference results by needle program and pairwiseAlignment
Hervé Pagès
hpages at fhcrc.org
Thu Dec 6 23:06:49 CET 2012
Hi,
On 12/06/2012 01:15 PM, Wang Peter wrote:
> i used EMBOSS package needle program got the score 6
>
> pairwiseAlignment(seq1,seq2, type = "global", substitutionMatrix =
> "BLOSUM62", gapOpening = -10, gapExtension = -0.5)
>
> both use the same parameter
>
> # Matrix: EBLOSUM62
> # Gap_penalty: -10.0
> # Extend_penalty: -0.5
2 things:
1. pairwiseAlignment() interprets gapOpening/gapExtension
not really the standard way (which is kind of unfortunate).
See ?writePairwiseAlignments for how to translate between
gapOpening/gapExtension and Gap_penalty/Extend_penalty.
So to effectively achieve EMBOSS Gap_penalty and Extend_penalty
of -10 and -0.5, respectively, you would need to set 'gapOpening'
and 'gapExtension' to 9.5 and 0.5, respectively.
2. However, another kind of strange feature of pairwiseAlignment()
(that I was not aware of until now), is that it will only accept
non-positive values for gapOpening and gapExtension. More precisely,
you can pass positive values, but, in that case, the opposite values
will effectively (and silently) be used and stored in the returned
object:
pa <- pairwiseAlignment(seq1, seq2, type="global",
substitutionMatrix="BLOSUM62",
gapOpening=9.5, gapExtension=0.5)
> pa at gapOpening
[1] -9.5
> pa at gapExtension
[1] -0.5
So there is no way you can effectively set the 'gapOpening' and
'gapExtension' args to achieve EMBOSS Gap_penalty and Extend_penalty
to -10 and -0.5.
BUT THE MOST IMPORTANT THING IS THIS: Gap_penalty and Extend_penalty
are really aimed to be positive! By setting them to negative values,
you encourage gaps i.e. the more gaps the better the final score will
be. That means you can align pretty much anything with anything. With
your settings, even sequences that have almost nothing in common
(like your 2 sequences) get a positive score!
I guess this is why the original author of pairwiseAlignment() didn't
want to allow positive gapOpening and/or gapExtension values. Maybe
we should allow them though, just so that people can experiment and
compare results with other tools. But if not, pairwiseAlignment()
should raise an error or at least issue a warning that the opposite
values are effectively used. What do people think?
Thanks,
H.
>
> sequences are
>> AAG19119.1
> mgitkqvsirsdnfipghrvfderpHILHvcdikvnpednmwefeyqkekdeissrhtkdvylymeldekqhskltsffl
> eekprdssHSFHvdisgdiiqqvtkrpgagtsdkeknkytpspirfealsdrvvietfagdnpiipyyevthhdtpqdrh
> nvsriesfiqsmqdggstpgleplptelvselqsaewgeevrHHLHegdecyrnsllhpalssyihaiewalisflkeke
> dvdiiqqekngdlyyfasgnsilgevqdtgelsqksisrikslnraerrwmghhksgeatkeeldgmrarltqileelfg
> tdsark
>> AAG19157.1
> msvaddstddHGHHlpavedwpkgfgeaswwpfitaigaagfyigaalyvlgrgesalvgpmvgpgvfiastfaflagly
> gwvyhafvaaywsndgggstalrwgmigflgseiatfgagfayyffiragtqwstaasaipdgflgslvvvntailvvss
> ftlHFAHvalrkgnrsrflallvstlvlgvvfiggqvyeyyefiahegftlsgglfesaffgltglHGLHvtmgavligi
> imvrglrgqysadrhtsvstvsmywhfvdivwiflvvvlyagsva
>
> sionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
> [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
> [4] LC_NUMERIC=C
> [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.14.4 latticeExtra_0.6-24 RColorBrewer_1.0-5
> [4] Rsamtools_1.8.6 lattice_0.20-6 Biostrings_2.24.1
> [7] GenomicRanges_1.8.13 IRanges_1.14.4 BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.16.0 bitops_1.0-5 grid_2.15.1 hwriter_1.3
> [5] stats4_2.15.1 tools_2.15.1 zlibbioc_1.2.0
>
> thank you very much
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list