[Bioc-devel] Problem with seqnames of TwoBitFile from AnnotationHub

Hervé Pagès hpages at fredhutch.org
Fri Jan 8 22:49:46 CET 2016


On 01/08/2016 01:09 PM, Michael Lawrence wrote:
> That is one solution. But everyone using that genome would need to
> reset the seqlevels to the "standard" ones. In this specific case, is
> there any reason not to just use the BSgenome for GRCh38?

I agree. Maybe we don't need seqlevels<-,TwoBitFile for that particular
use case. Just wanted to mention that the ability to rename the
sequences in a TwoBitFile, FastaFile, or other file-based object that
supports seqinfo() would be useful in general.

H.

>
> On Fri, Jan 8, 2016 at 11:04 AM, Hervé Pagès <hpages at fredhutch.org> wrote:
>> Hi Jo, Michael,
>>
>> What about implementing a seqlevels() setter for TwoBitFile objects? All
>> you need for this is an extra slot for storing the user-supplied
>> seqlevels. Note that in general the seqlevels() setter allows more than
>> renaming the seqlevels. It also allows dropping, adding, and shuffling
>> them. But you don't need to support all that. Supporting renaming would
>> already go a long way. See selectMethod("seqlevels<-", "TxDb") in
>> GenomicFeatures for an example of a restricted "seqlevels<-" method.
>>
>> H.
>>
>>
>> On 01/08/2016 09:50 AM, Rainer Johannes wrote:
>>>
>>> I agree, I would not modify the file content. At present it is however not
>>> possible to use e.g. getSeq on these TwoBitFiles, since the chromosome names
>>> in the submitted GRanges (e.g. 1) do not match the seqnames/seqinfo of the
>>> TwoBitFile. I don’t know if a seqnames or seqinfo method stripping of all
>>> but the first name-part would help here...
>>>
>>> jo
>>>
>>>> On 08 Jan 2016, at 15:18, Sean Davis <seandavi at gmail.com> wrote:
>>>>
>>>> I will make the small editorial comment to guard against modifying file
>>>> content on transit into the hub object. On the client side (after getting
>>>> such an object) I think a “fix” would be to have a quick seqnames method to
>>>> strip off all but the first whitespace delimited piece.
>>>>
>>>> Sean
>>>>
>>>>> On Jan 8, 2016, at 8:40 AM, Michael Lawrence <lawrence.michael at gene.com>
>>>>> wrote:
>>>>>
>>>>> This is perhaps something that could be handled when population the
>>>>> hub, but I'm not sure how rtracklayer could automatically derive the
>>>>> chromosome names.
>>>>>
>>>>> On Fri, Jan 8, 2016 at 2:37 AM, Rainer Johannes
>>>>> <Johannes.Rainer at eurac.edu> wrote:
>>>>>>
>>>>>> dear all,
>>>>>>
>>>>>> I just run into a problem with a TwoBitFile I fetched from
>>>>>> AnnotationHub. I was fetching a TwoBitFile with the genomic DNA sequence, as
>>>>>> provided by Ensembl:
>>>>>>
>>>>>>> library(AnnotationHub)
>>>>>>> ah <- AnnotationHub()
>>>>>>> tbf <- ah[["AH50068”]]
>>>>>>
>>>>>>
>>>>>>> head(seqnames(seqinfo(tbf)))
>>>>>>
>>>>>> [1] "1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF"
>>>>>> [2] "10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF"
>>>>>> [3] "11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF"
>>>>>> [4] "12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF"
>>>>>> [5] "13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF"
>>>>>> [6] "14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF"
>>>>>>
>>>>>> Would be nice, if the seqnames would be really just the chromsome names
>>>>>> and not the whole string from the FA file header. Is there a way I could fix
>>>>>> the file myself or is this something that should be fixed in the rtracklayer
>>>>>> or AnnotationHub package when the TwoBitFile is created?
>>>>>>
>>>>>> thanks, jo
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list