[BioC] [devteam-bioc] GenomicRanges seqlengths problem
Valerie Obenchain
vobencha at fhcrc.org
Thu May 1 18:52:45 CEST 2014
Hi,
The seqnames are in a different orders in the 'chrs' and 'sl' objects.
Looking at the first 3 names in each,
> head(seqlengths(chrs), 3)
lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2
NA NA NA
> head(sl, 3)
lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2
4258568 3378610 2939989
When a GRanges is created, seqnames are sorted in ascii order. You can
see the full list with seqinfo():
> seqinfo(chrs)
Seqinfo of length 34
seqnames seqlengths isCircular genome
lm_SuperContig_0_v2 <NA> <NA> <NA>
lm_SuperContig_1_v2 <NA> <NA> <NA>
lm_SuperContig_10_v2 <NA> <NA> <NA>
lm_SuperContig_11_v2 <NA> <NA> <NA>
lm_SuperContig_12_v2 <NA> <NA> <NA>
... ... ... ...
The seqnames in the replacement vector must match the order in the
'Seqinfo' object.
To re-order:
sl_new <- sl[match(levels(factor(names(sl))), names(sl))]
Then add the new lengths:
>> seqlengths(chrs) <- sl_new
>> chrs
> GRanges with 34 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] lm_SuperContig_0_v2 [1, 4258568] *
> [2] lm_SuperContig_1_v2 [1, 3378610] *
> [3] lm_SuperContig_2_v2 [1, 2939989] *
> [4] lm_SuperContig_3_v2 [1, 2348246] *
> [5] lm_SuperContig_4_v2 [1, 1918205] *
> ... ... ... ...
> [30] lm_SuperContig_29_v2 [1, 200940] *
> [31] lm_SuperContig_30_v2 [1, 154863] *
> [32] lm_SuperContig_31_v2 [1, 143268] *
> [33] lm_SuperContig_32_v2 [1, 87679] *
> [34] lm_SuperContig_34_v2 [1, 58596] *
> ---
> seqlengths:
> lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... lm_SuperContig_9_v2
> 4258568 3378610 ... 1772623
Valerie
On 05/01/2014 07:20 AM, Maintainer wrote:
>
> Hello,
>
> I have a problem with supplying GRanges object with seqlengths.
> I have a files.
> chrs.txt - contains information about chromosomes
>
> chrs.txt
> chr,start,end,len
> lm_SuperContig_0_v2,1,4258568,4258568
> lm_SuperContig_1_v2,1,3378610,3378610
> lm_SuperContig_2_v2,1,2939989,2939989
> lm_SuperContig_3_v2,1,2348246,2348246
> lm_SuperContig_4_v2,1,1918205,1918205
> lm_SuperContig_6_v2,1,1888674,1888674
> lm_SuperContig_5_v2,1,1869450,1869450
> lm_SuperContig_8_v2,1,1809296,1809296
> lm_SuperContig_9_v2,1,1772623,1772623
> lm_SuperContig_7_v2,1,1769547,1769547
> lm_SuperContig_10_v2,1,1758670,1758670
> lm_SuperContig_13_v2,1,1634580,1634580
> lm_SuperContig_12_v2,1,1631710,1631710
> lm_SuperContig_11_v2,1,1590160,1590160
> lm_SuperContig_15_v2,1,1560629,1560629
> lm_SuperContig_14_v2,1,1533332,1533332
> lm_SuperContig_17_v2,1,1445693,1445693
> lm_SuperContig_16_v2,1,1397653,1397653
> lm_SuperContig_18_v2,1,1351976,1351976
> lm_SuperContig_19_v2,1,1186800,1186800
> lm_SuperContig_20_v2,1,1087932,1087932
> lm_SuperContig_21_v2,1,1020521,1020521
> lm_SuperContig_22_v2,1,731443,731443
> lm_SuperContig_23_v2,1,521426,521426
> lm_SuperContig_24_v2,1,475869,475869
> lm_SuperContig_25_v2,1,318058,318058
> lm_SuperContig_26_v2,1,261540,261540
> lm_SuperContig_27_v2,1,250629,250629
> lm_SuperContig_28_v2,1,236098,236098
> lm_SuperContig_29_v2,1,200940,200940
> lm_SuperContig_30_v2,1,154863,154863
> lm_SuperContig_31_v2,1,143268,143268
> lm_SuperContig_32_v2,1,87679,87679
> lm_SuperContig_34_v2,1,58596,58596
>
> I try to do the following:
> # creating GRanges object for chromosomes
> dataChr <- read.table("chrs.txt",header=T,sep=",")
> chrs <- with(dataChr, GRanges(chr, IRanges(start, end)))
> sl <- setNames(dataChr$len, as.character(dataChr$chr))
> seqlengths(chrs) <- sl
>
> And I get the following error:
>
> Error in .normargSeqlengths(value, seqnames(x)) :
> when the supplied 'seqlengths' vector is named, the names must match the seqnames
>
> Any chance you could help me with what is going on?
>
> Best wishes,
> Agnieszka
>
>
> -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 BiocGenerics_0.10.0
> [8] BiocInstaller_1.14.2
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0
> [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2
> [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 fail_1.2
> [19] foreach_1.4.2 Formula_1.1-1 GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 gridExtra_0.9.1
> [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 labeling_0.2 lattice_0.20-29 latticeExtra_0.6-26
> [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1
> [37] RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 scales_0.2.4
> [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0
> [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> ________________________________________________________________________
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>
--
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109
Email: vobencha at fhcrc.org
Phone: (206) 667-3158
More information about the Bioconductor
mailing list