[Bioc-devel] request: high-level seqlevel utilities

Julian Gehring julian.gehring at embl.de
Mon Dec 16 14:00:20 CET 2013


Hi Michael,

I would second your request.  In a package I'll submitting soon, I have 
a work-around for this by defining a set of functions like 
'hsAutosomes', 'hsAllosomes' etc. that return the respective set of 
human chromosome names.  Perhaps on could incorporate this in the 
'seqinfo' class, by additional columns similar to 'isCircular'.  One 
would still need an additional data source for this, since the 
information about which chr is primary, autosome etc. in not contained 
in a standard reference file.

> We've found that analysts often need to restrict seqlevels to certain
> pre-defined sets of chromsomes. Given the variability across organisms, it
> would be nice to have an abstraction.
>
> We often see this in code:
>
> keepSeqlevels(seqinfo, as.character(1:22)
> keepSeqlevels(seqinfo, c(1:22, "X", "Y"))
>
> Perhaps instead we could the more abstract and arguably more readable:
>
> keepAutosomes(seqinfo)
> keepPrimaryChromosomes(seqinfo)
>
> Not sure of the best term for the latter. It refers to the set of
> chromosomes that are not assembly fragments but are generally in the
> nucleus (when there is one).


Does the current 'sortSeqlevels' function address this? E.g.

#+BEGIN_SRC R

library(GenomicRanges)
seqinfo <- Seqinfo(paste0("chr", c(10, 1, 3)), c(10000, 1000, 3000), NA, 
"mock1")
seqinfo  ## 'chr10', 'chr1', 'chr3'
sortSeqlevels(seqinfo) ## now sorted 'chr1', 'chr3', 'chr10'

#+END_SRC

> It would also be nice to have a sort,Seqinfo method that sorts by the
> natural ordering of the chromosomes, if there is one. Maybe the function
> needs its own name, but either way, this is something that really needs to
> be in the infrastructure.
>
> I think the existing SeqnameStyle infrastructure should be able to support
> this.

Best wishes
Julian



More information about the Bioc-devel mailing list