[Bioc-devel] Problem with seqnames of TwoBitFile from AnnotationHub

Hervé Pagès hpages at fredhutch.org
Fri Jan 8 20:04:43 CET 2016


Hi Jo, Michael,

What about implementing a seqlevels() setter for TwoBitFile objects? All
you need for this is an extra slot for storing the user-supplied
seqlevels. Note that in general the seqlevels() setter allows more than
renaming the seqlevels. It also allows dropping, adding, and shuffling
them. But you don't need to support all that. Supporting renaming would
already go a long way. See selectMethod("seqlevels<-", "TxDb") in
GenomicFeatures for an example of a restricted "seqlevels<-" method.

H.

On 01/08/2016 09:50 AM, Rainer Johannes wrote:
> I agree, I would not modify the file content. At present it is however not possible to use e.g. getSeq on these TwoBitFiles, since the chromosome names in the submitted GRanges (e.g. 1) do not match the seqnames/seqinfo of the TwoBitFile. I don’t know if a seqnames or seqinfo method stripping of all but the first name-part would help here...
>
> jo
>
>> On 08 Jan 2016, at 15:18, Sean Davis <seandavi at gmail.com> wrote:
>>
>> I will make the small editorial comment to guard against modifying file content on transit into the hub object. On the client side (after getting such an object) I think a “fix” would be to have a quick seqnames method to strip off all but the first whitespace delimited piece.
>>
>> Sean
>>
>>> On Jan 8, 2016, at 8:40 AM, Michael Lawrence <lawrence.michael at gene.com> wrote:
>>>
>>> This is perhaps something that could be handled when population the
>>> hub, but I'm not sure how rtracklayer could automatically derive the
>>> chromosome names.
>>>
>>> On Fri, Jan 8, 2016 at 2:37 AM, Rainer Johannes
>>> <Johannes.Rainer at eurac.edu> wrote:
>>>> dear all,
>>>>
>>>> I just run into a problem with a TwoBitFile I fetched from AnnotationHub. I was fetching a TwoBitFile with the genomic DNA sequence, as provided by Ensembl:
>>>>
>>>>> library(AnnotationHub)
>>>>> ah <- AnnotationHub()
>>>>> tbf <- ah[["AH50068”]]
>>>>
>>>>> head(seqnames(seqinfo(tbf)))
>>>> [1] "1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF"
>>>> [2] "10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF"
>>>> [3] "11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF"
>>>> [4] "12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF"
>>>> [5] "13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF"
>>>> [6] "14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF"
>>>>
>>>> Would be nice, if the seqnames would be really just the chromsome names and not the whole string from the FA file header. Is there a way I could fix the file myself or is this something that should be fixed in the rtracklayer or AnnotationHub package when the TwoBitFile is created?
>>>>
>>>> thanks, jo
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list