[Bioc-devel] Best object structure for representing a pairwise genome alignment ?

Vincent Carey @tvjc @end|ng |rom ch@nn|ng@h@rv@rd@edu
Fri Sep 18 11:41:47 CEST 2020


Starting from

PairwiseAlignments-class      package:Biostrings       R Documentation

PairwiseAlignments, PairwiseAlignmentsSingleSubject, and
PairwiseAlignmentsSingleSubjectSummary objects

Description:

     The ‘PairwiseAlignments’ class is a container for storing a set of
     pairwise alignments.

     The ‘PairwiseAlignmentsSingleSubject’ class is a container for
     storing a set of pairwise alignments with a single subject.

     The ‘PairwiseAlignmentsSingleSubjectSummary’ class is a container
     for storing the summary of a set of pairwise alignments.

Usage:

     ## Constructors:
     ## When subject is missing, pattern must be of length 2
     ## S4 method for signature 'XString,XString'
     PairwiseAlignments(pattern, subject,
       type = "global", substitutionMatrix = NULL, gapOpening = 0,
gapExtension = 1)
     ## S4 method for signature 'XStringSet,missing'
     PairwiseAlignments(pattern, subject,
       type = "global", substitutionMatrix = NULL, gapOpening = 0,
gapExtension = 1)
     ## S4 method for signature 'character,character'
     PairwiseAlignments(pattern, subject,
       type = "global", substitutionMatrix = NULL, gapOpening = 0,
gapExtension = 1,
       baseClass = "BString")

...

my question would be whether this is a relevant starting place?  Clearly
the focus is not on coordinates, but perhaps a structure that maintains
genomic content and coordinates together would be of use?


On Fri, Sep 18, 2020 at 2:49 AM Charles Plessy <charles.plessy using oist.jp>
wrote:

> Dear Bioc developers,
>
> I am currently analysing pairwise genome alignments with Bioconductor,
> and I represent them with a GRanges object of the first genome,
> containing one element by alignment block, and storing the coordinates
> in the other genome in a metadata column containing another GRanges object.
>
> Something like this.
>
> GRanges object with 36582 ranges and 2 metadata columns:
>            seqnames      ranges strand |     score                query
>               <Rle>   <IRanges>  <Rle> | <numeric>            <GRanges>
>        [1]       S1     162-550      + |       861    XSR:909374-909853
>        [2]       S1    833-3738      + |      7238    XSR:910181-913291
>        [3]       S1   3769-4212      + |      1165    XSR:913510-913953
>        [4]       S1   4246-4381      + |       359    XSR:914134-914275
>        [5]       S1   4532-5990      + |      2977 chr2:6694031-6695569
>        ...      ...         ...    ... .       ...                  ...
>    [36578]      S99 17228-17759      - |       793 chr1:2375870-2376379
>    [36579]      S99 16417-16935      - |       632 chr1:2376612-2377077
>    [36580]      S99 12370-12759      - |       773 chr1:2379949-2380343
>    [36581]      S99   5270-5384      - |       295   chr1:843397-843511
>    [36582]      S99   1949-3053      - |      2105   chr1:845358-846326
>    -------
>
> Using "Pairwise genome alignment" as a keyword in a search engine, I
> found that the packages CNEr is doing something similar, although it
> uses a dedicated "GRangePairs" object for the purpose.
>
> Before I start to invest time in either direction, I wanted to check on
> that mailing list if there were other solutions already existing, in
> particularly closer to the core packages ?
>
> Have a nice day,
>
> Charles
>
> --
> Charles Plessy - - ~ ~ ~ ~ ~ ~~~~ ~ ~ ~ ~ ~ - - charles.plessy using oist.jp
> Okinawa  Institute  of  Science  and  Technology  Graduate  University
> Staff scientist in the Luscombe Unit - ~ - https://groups.oist.jp/grsu
> Toots from work - ~ ~~ ~ - https://mastodon.technology/@charles_plessy
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
The information in this e-mail is intended only for the ...{{dropped:18}}



More information about the Bioc-devel mailing list