[Bioc-devel] Problem with seqnames of TwoBitFile from AnnotationHub
Rainer Johannes
Johannes.Rainer at eurac.edu
Sat Jan 9 17:01:33 CET 2016
Yes, using BSGenome would help in this case.
In the long run I think it might be important to have this fixed, not necessarily for human, but for other species/genome builds for which there might not be an BSGenome package available; through AnnotationHub all GTF files and fasta files would be available. Note also that the FaFiles from Ensembl do have the “correct” chromosome names although I assume they were built from the same Ensembl fasta files than the TwoBitFiles.
jo
> On 08 Jan 2016, at 22:49, Hervé Pagès <hpages at fredhutch.org> wrote:
>
> On 01/08/2016 01:09 PM, Michael Lawrence wrote:
>> That is one solution. But everyone using that genome would need to
>> reset the seqlevels to the "standard" ones. In this specific case, is
>> there any reason not to just use the BSgenome for GRCh38?
>
> I agree. Maybe we don't need seqlevels<-,TwoBitFile for that particular
> use case. Just wanted to mention that the ability to rename the
> sequences in a TwoBitFile, FastaFile, or other file-based object that
> supports seqinfo() would be useful in general.
>
> H.
>
>>
>> On Fri, Jan 8, 2016 at 11:04 AM, Hervé Pagès <hpages at fredhutch.org> wrote:
>>> Hi Jo, Michael,
>>>
>>> What about implementing a seqlevels() setter for TwoBitFile objects? All
>>> you need for this is an extra slot for storing the user-supplied
>>> seqlevels. Note that in general the seqlevels() setter allows more than
>>> renaming the seqlevels. It also allows dropping, adding, and shuffling
>>> them. But you don't need to support all that. Supporting renaming would
>>> already go a long way. See selectMethod("seqlevels<-", "TxDb") in
>>> GenomicFeatures for an example of a restricted "seqlevels<-" method.
>>>
>>> H.
>>>
>>>
>>> On 01/08/2016 09:50 AM, Rainer Johannes wrote:
>>>>
>>>> I agree, I would not modify the file content. At present it is however not
>>>> possible to use e.g. getSeq on these TwoBitFiles, since the chromosome names
>>>> in the submitted GRanges (e.g. 1) do not match the seqnames/seqinfo of the
>>>> TwoBitFile. I don’t know if a seqnames or seqinfo method stripping of all
>>>> but the first name-part would help here...
>>>>
>>>> jo
>>>>
>>>>> On 08 Jan 2016, at 15:18, Sean Davis <seandavi at gmail.com> wrote:
>>>>>
>>>>> I will make the small editorial comment to guard against modifying file
>>>>> content on transit into the hub object. On the client side (after getting
>>>>> such an object) I think a “fix” would be to have a quick seqnames method to
>>>>> strip off all but the first whitespace delimited piece.
>>>>>
>>>>> Sean
>>>>>
>>>>>> On Jan 8, 2016, at 8:40 AM, Michael Lawrence <lawrence.michael at gene.com>
>>>>>> wrote:
>>>>>>
>>>>>> This is perhaps something that could be handled when population the
>>>>>> hub, but I'm not sure how rtracklayer could automatically derive the
>>>>>> chromosome names.
>>>>>>
>>>>>> On Fri, Jan 8, 2016 at 2:37 AM, Rainer Johannes
>>>>>> <Johannes.Rainer at eurac.edu> wrote:
>>>>>>>
>>>>>>> dear all,
>>>>>>>
>>>>>>> I just run into a problem with a TwoBitFile I fetched from
>>>>>>> AnnotationHub. I was fetching a TwoBitFile with the genomic DNA sequence, as
>>>>>>> provided by Ensembl:
>>>>>>>
>>>>>>>> library(AnnotationHub)
>>>>>>>> ah <- AnnotationHub()
>>>>>>>> tbf <- ah[["AH50068”]]
>>>>>>>
>>>>>>>
>>>>>>>> head(seqnames(seqinfo(tbf)))
>>>>>>>
>>>>>>> [1] "1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF"
>>>>>>> [2] "10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF"
>>>>>>> [3] "11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF"
>>>>>>> [4] "12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF"
>>>>>>> [5] "13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF"
>>>>>>> [6] "14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF"
>>>>>>>
>>>>>>> Would be nice, if the seqnames would be really just the chromsome names
>>>>>>> and not the whole string from the FA file header. Is there a way I could fix
>>>>>>> the file myself or is this something that should be fixed in the rtracklayer
>>>>>>> or AnnotationHub package when the TwoBitFile is created?
>>>>>>>
>>>>>>> thanks, jo
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fredhutch.org
>>> Phone: (206) 667-5791
>>> Fax: (206) 667-1319
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
More information about the Bioc-devel
mailing list