[BioC] [devteam-bioc] GenomicRanges seqlengths problem

Miss Agnieszka Aleksandra Golicz agnieszka.golicz at uq.net.au
Fri May 2 02:01:40 CEST 2014


Hello,

Thank you very much for your help.

I am facing one more problem with GRanges.
I am trying to emulate this object.
data("CRC", package = "biovizBase")
object: mut.gr (it's SNP annotation for human chromosomes 1-22)

When I just do show on mut.gr
show(mut.gr)
GRanges with 60 ranges and 10 metadata columns:
       seqnames                 ranges strand | Hugo_Symbol Entrez_Gene_Id   Center NCBI_Build   Strand Variant_Classification Variant_Type Reference_Allele
          <Rle>              <IRanges>  <Rle> |    <factor>      <integer> <factor>  <integer> <factor>               <factor>     <factor>         <factor>
   [1]        1 [ 11003085,  11003085]      + |      TARDBP          23435    Broad         36        +               Missense          SNP                G
   [2]        1 [ 62352395,  62352395]      + |       INADL          10207    Broad         36        +               Missense          SNP                T
   [3]        1 [194960885, 194960885]      + |         CFH           3075    Broad         36        +               Missense          SNP                G
   [4]        2 [ 10116508,  10116508]      - |        CYS1         192668    Broad         36        -               Missense          SNP                C
   [5]        2 [ 33617747,  33617747]      + |     RASGRP3          25780    Broad         36        +               Missense          SNP                C
   [6]        2 [ 73894280,  73894280]      + |     C2orf78         388960    Broad         36        +               Missense          SNP                T
   [7]        2 [ 96732769,  96732769]      + |      FER1L5          90342    Broad         36        +               Missense          SNP                T
   [8]        2 [179160267, 179160267]      - |         TTN           7273    Broad         36        -               Missense          SNP                C
   [9]        2 [217251189, 217251189]      - |      IGFBP5           3488    Broad         36        -               Missense          SNP                C
  [10]        3 [ 12620699,  12620699]      - |        RAF1           5894    Broad         36        -               Missense          SNP                G
  [11]        3 [ 46472880,  46472880]      - |         LTF           4057    Broad         36        -               Missense          SNP                C
  [12]        3 [137203130, 137203130]      + |     PPP2R3A           5523    Broad         36        +               Missense          SNP                A
  [13]        3 [137457429, 137457429]      + |        PCCB           5096    Broad         36        +               Missense          SNP                C
  [14]        3 [184708629, 184708629]      - |       KLHL6          89857    Broad         36        -               Missense          SNP                G
  [15]        4 [147434591, 147434591]      - |     SLC10A7          84068    Broad         36        -               Missense          SNP                G
  [16]        4 [185915412, 185915412]      - |       ACSL1           2180    Broad         36        -               Missense          SNP                A
  [17]        5 [ 79070342,  79070342]      + |       CMYA5         202333    Broad         36        +               Missense          SNP                C
  [18]        5 [ 94775579,  94775579]      + |      FAM81B         153643    Broad         36        +               Missense          SNP                G
  [19]        5 [140838266, 140838266]      + |     PCDHGC3           5098    Broad         36        +               Missense          SNP                T
  [20]        6 [109992724, 109992724]      - |        AKD1         221264    Broad         36        -               Missense          SNP                C
  [21]        6 [118993492, 118993492]      - |    C6orf204         387119    Broad         36        -               Missense          SNP                G
  [22]        7 [124286401, 124286401]      - |        POT1          25913    Broad         36        -               Missense          SNP                A
  [23]        7 [125960456, 125960456]      - |        GRM8           2918    Broad         36        -               Missense          SNP                A
  [24]        9 [ 35097658,  35097658]      - |    KIAA1539          80256    Broad         36        -               Missense          SNP                A
  [25]        9 [103164773, 103164773]      - |        BAAT            570    Broad         36        -               Missense          SNP                G
  [26]        9 [132745783, 132745783]      + |        ABL1             25    Broad         36        +               Missense          SNP                C
  [27]        9 [138481305, 138481305]      - |      SEC16A           9919    Broad         36        -               Missense          SNP                C
  [28]       10 [  7661761,   7661761]      - |       ITIH5          80760    Broad         36        -               Missense          SNP                C
  [29]       10 [106064683, 106064683]      - |      ITPRIP          85450    Broad         36        -               Missense          SNP                T
  [30]       11 [   603430,    603430]      - |        IRF7           3665    Broad         36        -               Missense          SNP                G
  [31]       11 [  5610148,   5610148]      + |      TRIM34          53840    Broad         36        +               Missense          SNP                A
  [32]       11 [  9420281,   9420281]      + |        IPO7          10527    Broad         36        +               Missense          SNP                G
  [33]       11 [ 26495005,  26495005]      + |        ANO3          63982    Broad         36        +               Missense          SNP                C
  [34]       11 [ 55629818,  55629818]      + |       OR8H2         390151    Broad         36        +               Missense          SNP                G
  [35]       11 [116737834, 116737834]      + |      CEP164          22897    Broad         36        +               Missense          SNP                C
  [36]       12 [ 25165980,  25165980]      - |       CASC1          55259    Broad         36        -               Missense          SNP                G
  [37]       12 [ 25289548,  25289548]      - |        KRAS           3845    Broad         36        -               Missense          SNP                C
  [38]       12 [ 31142142,  31142142]      + |       DDX11           1663    Broad         36        +               Missense          SNP                G
  [39]       12 [ 42434771,  42434771]      - |       PUS7L          83448    Broad         36        -               Missense          SNP                T
  [40]       12 [ 55006974,  55006974]      - |        PAN2           9924    Broad         36        -               Missense          SNP                T
  [41]       13 [102199348, 102199348]      - |   LOC643677         643677    Broad         36        -               Missense          SNP                G
  [42]       15 [ 40503502,  40503502]      - |      ZFP106          64397    Broad         36        -               Missense          SNP                C
  [43]       15 [ 70816269,  70816269]      + |        BBS4            585    Broad         36        +               Missense          SNP                A
  [44]       16 [ 23481033,  23481033]      + |       UBFD1          56061    Broad         36        +               Missense          SNP                C
  [45]       17 [ 21149010,  21149010]      + |      MAP2K3           5606    Broad         36        +               Missense          SNP                G
  [46]       17 [ 35812691,  35812691]      - |       TOP2A           7153    Broad         36        -               Missense          SNP                T
  [47]       18 [ 16788810,  16788810]      - |       ROCK1           6093    Broad         36        -               Missense          SNP                C
  [48]       18 [ 75322338,  75322338]      + |      NFATC1           4772    Broad         36        +               Missense          SNP                G
  [49]       19 [ 15713495,  15713495]      + |      OR10H3          26532    Broad         36        +               Missense          SNP                G
  [50]       19 [ 40730140,  40730140]      + |     TMEM147          10430    Broad         36        +               Missense          SNP                C
  [51]       19 [ 52338664,  52338664]      + |        SAE1          10055    Broad         36        +               Missense          SNP                G
  [52]       19 [ 57407795,  57407795]      + |     PPP2R1A           5518    Broad         36        +               Missense          SNP                G
  [53]       20 [ 23298287,  23298287]      + |        GZF1          64412    Broad         36        +               Missense          SNP                A
  [54]       20 [ 31012946,  31012946]      + |      EFCAB8         388795    Broad         36        +               Missense          SNP                C
  [55]       20 [ 40223536,  40223536]      - |       PTPRT          11122    Broad         36        -               Missense          SNP                C
  [56]       20 [ 54467136,  54467136]      + |       CASS4          57091    Broad         36        +               Missense          SNP                G
  [57]       20 [ 60201983,  60201983]      + |      GTPBP5          26164    Broad         36        +               Missense          SNP                C
  [58]       21 [ 36688774,  36688774]      + |      CHAF1B           8208    Broad         36        +               Missense          SNP                C
  [59]       21 [ 39699770,  39699770]      - |       LCA5L         150082    Broad         36        -               Missense          SNP                T
  [60]       22 [ 27437953,  27437953]      - |       CHEK2          11200    Broad         36        -               Missense          SNP                C

In the seqnames column chromosomes 8 and 14 are missing, which makes sense - there is no SNPs on those.

However, when  do: 
head(seqlengths(mut.gr), 25)  

       1         2         3         4         5         6         7         8         9        10        11        12        13        14        15 
249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 135006516 133851895 115169878 107349540 102531392 
       16        17        18        19        20        21        22 
 90354753  81195210  78077248  59128983  63025520  48129895  51304566 

All 22 chromosomes are there.

How is that possible?

I believe that for my data, I need to do a similar thing (I'm trying to create a  multilayer circular plot in ggbio). Although I have snps for only few chromosomes, I think I need to include lengths for all of them.
How do I do that?

Best wishes,
Agnieszka
________________________________________
From: Hervé Pagès <hpages at fhcrc.org>
Sent: 02 May 2014 03:27
To: vobencha at fhcrc.org; guest at bioconductor.org; bioconductor at r-project.org; Miss Agnieszka Aleksandra Golicz
Cc: GenomicRanges Maintainer
Subject: Re: [devteam-bioc] GenomicRanges seqlengths problem

On 05/01/2014 10:21 AM, Maintainer wrote:
> Hi Sonia, Val,
>
> On 05/01/2014 09:52 AM, Maintainer wrote:
>> Hi,
>>
>> The seqnames are in a different orders in the 'chrs' and 'sl' objects.
>> Looking at the first 3 names in each,
>>
>>    > head(seqlengths(chrs), 3)
>>     lm_SuperContig_0_v2  lm_SuperContig_1_v2 lm_SuperContig_10_v2
>>                      NA                   NA                   NA
>>
>>    > head(sl, 3)
>> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2
>>                4258568             3378610             2939989
>>
>> When a GRanges is created, seqnames are sorted in ascii order.
>
> Just to clarify, the seqlevels are sorted, not the seqnames:
>
>     > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10))
>     > gr
>     GRanges with 3 ranges and 0 metadata columns:
>           seqnames    ranges strand
>              <Rle> <IRanges>  <Rle>
>       [1]        b   [1, 10]      *
>       [2]        a   [2, 10]      *
>       [3]        b   [3, 10]      *
>       ---
>       seqlengths:
>         a  b
>        NA NA
>
> Not sorted (i.e. user-supplied order is preserved):
>
>     > seqnames(gr)
>     factor-Rle of length 3 with 3 runs
>       Lengths: 1 1 1
>       Values : b a b
>     Levels(2): a b
>
> Sorted:
>
>     > seqlevels(gr)
>     [1] "a" "b"
>
> This sorting of the seqlevels is not good anyway (people with different
> LOCALE will get different results). I just fixed this so now the
> GRanges() constructor will preserve the seqlevels in the order supplied
> by the user:
>
>     > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10))
>     > seqlevels(gr)
>     [1] "b" "a"
>
> More precisely, the seqlevels are obtained by doing unique() on the
> seqnames.
>
> The fix will propagate to the public repos and become available thru
> biocLite() in the next 24 hours.
>
> Then your original code should just work Sonia.

Or maybe not :) You might still need to use 'stringsAsFactors=FALSE'
when calling read.table(). Please let us know if that still doesn't
make it.

Thanks,
H.

>
> Cheers,
> H.
>
>
>> You can
>> see the full list with seqinfo():
>>
>>    > seqinfo(chrs)
>> Seqinfo of length 34
>> seqnames             seqlengths isCircular genome
>> lm_SuperContig_0_v2        <NA>       <NA>   <NA>
>> lm_SuperContig_1_v2        <NA>       <NA>   <NA>
>> lm_SuperContig_10_v2       <NA>       <NA>   <NA>
>> lm_SuperContig_11_v2       <NA>       <NA>   <NA>
>> lm_SuperContig_12_v2       <NA>       <NA>   <NA>
>> ...                         ...        ...    ...
>>
>> The seqnames in the replacement vector must match the order in the
>> 'Seqinfo' object.
>>
>> To re-order:
>>
>> sl_new <- sl[match(levels(factor(names(sl))), names(sl))]
>>
>> Then add the new lengths:
>>
>>>> seqlengths(chrs) <- sl_new
>>>> chrs
>>> GRanges with 34 ranges and 0 metadata columns:
>>>                      seqnames       ranges strand
>>>                         <Rle>    <IRanges>  <Rle>
>>>      [1]  lm_SuperContig_0_v2 [1, 4258568]      *
>>>      [2]  lm_SuperContig_1_v2 [1, 3378610]      *
>>>      [3]  lm_SuperContig_2_v2 [1, 2939989]      *
>>>      [4]  lm_SuperContig_3_v2 [1, 2348246]      *
>>>      [5]  lm_SuperContig_4_v2 [1, 1918205]      *
>>>      ...                  ...          ...    ...
>>>     [30] lm_SuperContig_29_v2  [1, 200940]      *
>>>     [31] lm_SuperContig_30_v2  [1, 154863]      *
>>>     [32] lm_SuperContig_31_v2  [1, 143268]      *
>>>     [33] lm_SuperContig_32_v2  [1,  87679]      *
>>>     [34] lm_SuperContig_34_v2  [1,  58596]      *
>>>     ---
>>>     seqlengths:
>>>       lm_SuperContig_0_v2  lm_SuperContig_1_v2 ...  lm_SuperContig_9_v2
>>>                   4258568              3378610 ...              1772623
>>
>>
>> Valerie
>>
>>
>> On 05/01/2014 07:20 AM, Maintainer wrote:
>>>
>>> Hello,
>>>
>>> I have a problem with supplying GRanges object with seqlengths.
>>> I have a files.
>>> chrs.txt - contains information about chromosomes
>>>
>>> chrs.txt
>>> chr,start,end,len
>>> lm_SuperContig_0_v2,1,4258568,4258568
>>> lm_SuperContig_1_v2,1,3378610,3378610
>>> lm_SuperContig_2_v2,1,2939989,2939989
>>> lm_SuperContig_3_v2,1,2348246,2348246
>>> lm_SuperContig_4_v2,1,1918205,1918205
>>> lm_SuperContig_6_v2,1,1888674,1888674
>>> lm_SuperContig_5_v2,1,1869450,1869450
>>> lm_SuperContig_8_v2,1,1809296,1809296
>>> lm_SuperContig_9_v2,1,1772623,1772623
>>> lm_SuperContig_7_v2,1,1769547,1769547
>>> lm_SuperContig_10_v2,1,1758670,1758670
>>> lm_SuperContig_13_v2,1,1634580,1634580
>>> lm_SuperContig_12_v2,1,1631710,1631710
>>> lm_SuperContig_11_v2,1,1590160,1590160
>>> lm_SuperContig_15_v2,1,1560629,1560629
>>> lm_SuperContig_14_v2,1,1533332,1533332
>>> lm_SuperContig_17_v2,1,1445693,1445693
>>> lm_SuperContig_16_v2,1,1397653,1397653
>>> lm_SuperContig_18_v2,1,1351976,1351976
>>> lm_SuperContig_19_v2,1,1186800,1186800
>>> lm_SuperContig_20_v2,1,1087932,1087932
>>> lm_SuperContig_21_v2,1,1020521,1020521
>>> lm_SuperContig_22_v2,1,731443,731443
>>> lm_SuperContig_23_v2,1,521426,521426
>>> lm_SuperContig_24_v2,1,475869,475869
>>> lm_SuperContig_25_v2,1,318058,318058
>>> lm_SuperContig_26_v2,1,261540,261540
>>> lm_SuperContig_27_v2,1,250629,250629
>>> lm_SuperContig_28_v2,1,236098,236098
>>> lm_SuperContig_29_v2,1,200940,200940
>>> lm_SuperContig_30_v2,1,154863,154863
>>> lm_SuperContig_31_v2,1,143268,143268
>>> lm_SuperContig_32_v2,1,87679,87679
>>> lm_SuperContig_34_v2,1,58596,58596
>>>
>>> I try to do the following:
>>> # creating GRanges object for chromosomes
>>> dataChr <- read.table("chrs.txt",header=T,sep=",")
>>> chrs <- with(dataChr, GRanges(chr, IRanges(start, end)))
>>> sl <- setNames(dataChr$len, as.character(dataChr$chr))
>>> seqlengths(chrs) <- sl
>>>
>>> And I get the following error:
>>>
>>> Error in .normargSeqlengths(value, seqnames(x)) :
>>>      when the supplied 'seqlengths' vector is named, the names must match the seqnames
>>>
>>> Any chance you could help me with what is going on?
>>>
>>> Best wishes,
>>> Agnieszka
>>>
>>>
>>>     -- output of sessionInfo():
>>>
>>> R version 3.1.0 (2014-04-10)
>>> Platform: i386-w64-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
>>> [4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] XVector_0.4.0        ggbio_1.12.3         ggplot2_0.9.3.1      GenomicRanges_1.16.2 GenomeInfoDb_1.0.2   IRanges_1.22.4       BiocGenerics_0.10.0
>>> [8] BiocInstaller_1.14.2
>>>
>>> loaded via a namespace (and not attached):
>>>     [1] AnnotationDbi_1.26.0     BatchJobs_1.2            BBmisc_1.6               Biobase_2.24.0           BiocParallel_0.6.0       biomaRt_2.20.0
>>>     [7] Biostrings_2.32.0        biovizBase_1.12.1        bitops_1.0-6             brew_1.0-6               BSgenome_1.32.0          cluster_1.15.2
>>> [13] codetools_0.2-8          colorspace_1.2-4         DBI_0.2-7                dichromat_2.0-0          digest_0.6.4             fail_1.2
>>> [19] foreach_1.4.2            Formula_1.1-1            GenomicAlignments_1.0.0  GenomicFeatures_1.16.0   grid_3.1.0               gridExtra_0.9.1
>>> [25] gtable_0.1.2             Hmisc_3.14-4             iterators_1.0.7          labeling_0.2             lattice_0.20-29          latticeExtra_0.6-26
>>> [31] MASS_7.3-31              munsell_0.4.2            plyr_1.8.1               proto_0.3-10             RColorBrewer_1.0-5       Rcpp_0.11.1
>>> [37] RCurl_1.95-4.1           reshape2_1.4             Rsamtools_1.16.0         RSQLite_0.11.4           rtracklayer_1.24.0       scales_0.2.4
>>> [43] sendmailR_1.1-2          splines_3.1.0            stats4_3.1.0             stringr_0.6.2            survival_2.37-7          tools_3.1.0
>>> [49] VariantAnnotation_1.10.0 XML_3.98-1.1             zlibbioc_1.10.0
>>>
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> ________________________________________________________________________
>>> devteam-bioc mailing list
>>> To unsubscribe from this mailing list send a blank email to
>>> devteam-bioc-leave at lists.fhcrc.org
>>> You can also unsubscribe or change your personal options at
>>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>>>
>>
>>
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319


More information about the Bioconductor mailing list