[Bioc-devel] GAlignments Sorting Causes C Stack Error

Hervé Pagès hpages at fredhutch.org
Tue Jan 10 08:35:37 CET 2017


Hi,

No ordering was defined until now for GAlignments. Starting with
GAlignments 1.11.7, the elements of a GAlignments object 'x' are
ordered based on the order of the elements in 'granges(x)', that
is, by chromosome, then by strand, then by start, then by end.

Note that I chose a simple ordering definition for GAlignments objects
but it has the following important caveat: == does NOT look at the
cigar. That means that 2 elements in a GAlignments object are considered
equal even if their cigars are different:

 > gal <- GAlignments(Rle(factor("chr1"), 2), c(5L, 5L), c("10M", 
"5M2I5M"), Rle(strand("+"), 2))

 > gal
GAlignments object with 2 alignments and 0 metadata columns:
       seqnames strand       cigar    qwidth     start       end     width
          <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
   [1]     chr1      +         10M        10         5        14        10
   [2]     chr1      +      5M2I5M        12         5        14        10
           njunc
       <integer>
   [1]         0
   [2]         0
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

 > gal[1] == gal
[1] TRUE TRUE

Another ordering definition would be an ordering that looks at
chromosome/strand/start/end/cigar instead of just 
chromosome/strand/start/end (i.e. the cigar is used to break ties when 
looking at
chromosome/strand/start/end only leads to a tie). That would address
the above caveat but it feels weird to use lexicographic ordering of
the cigars in order to break ties. So I'm kind of hesitant to do this.

Anyway now <=, <, ==, !=, pcompare(), order(), sort(), rank(), etc...
work.

Still no match(), selfmatch(), duplicated(), or unique() for GAlignments
objects. Note that these things should adhere to the same notion of
equality as == so they would also ignore the cigar right now if we
wanted to have them. Maybe that's another argument in favor of an
ordering based on chromosome/strand/start/end/cigar.

H.


On 01/08/2017 05:00 PM, Dario Strbenac wrote:
> Good day,
>
> When sort is used on a GAlignments object, a stack error is shown, no matter how small the object is.
>
>> testAlignments
> GAlignments object with 3 alignments and 0 metadata columns:
>                                           seqnames strand       cigar    qwidth     start       end     width     njunc
>                                              <Rle>  <Rle> <character> <integer> <integer> <integer> <integer> <integer>
>   700666F:126:C8768ANXX:3:2204:3175:99484    chr14      +      71S27M        98  18386040  18386066        27         0
>   700666F:126:C8768ANXX:1:1107:8115:31928    chr14      +      40S60M       100  18915005  18915064        60         0
>   700666F:126:C8768ANXX:1:2206:7564:34686    chr14      +      40S50M        90  18915005  18915054        50         0
>   -------
>   seqinfo: 23 sequences from an unspecified genome
>
>> sort(testAlignments)
> Error: C stack usage  7970544 is too close to the limit
>
> I use up-to-date packages.
>
>> sessionInfo()
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 8 (jessie)
>
> locale:
>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8
>  [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C
> [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] GenomicAlignments_1.10.0   Rsamtools_1.26.1           Biostrings_2.42.1          XVector_0.14.0
>  [5] SummarizedExperiment_1.4.0 Biobase_2.34.0             GenomicRanges_1.26.2       GenomeInfoDb_1.10.1
>  [9] IRanges_2.8.1              S4Vectors_0.12.1           BiocGenerics_0.20.0
>
> loaded via a namespace (and not attached):
> [1] lattice_0.20-34    bitops_1.0-6       grid_3.3.2         zlibbioc_1.20.0    Matrix_1.2-7.1     BiocParallel_1.8.1
> [7] tools_3.3.2
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list