[BioC] Create transcriptDb using gff3 files? - library GenomicFeatures and rtracklayer

Cook, Malcolm MEC at stowers.org
Thu Apr 5 17:41:16 CEST 2012


> Hi all,
> 
> Sorry I haven't read the whole thread, still I have a few comments that might
> be off the main topic then.
> 
> On 5 Apr 2012, at 17:01, Cook, Malcolm wrote:
> 
> > Supporting both Ensemble's GTF and GFF3 would be ideal.
> >
> > Ensembl GTF would open up many genomes, including those in:
> > 	ftp://ftp.ensembl.org/pub/release-66/gtf/
> > 	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-13/gtf/
> > 	ftp://ftp.ensemblgenomes.org/pub/fungi/release-13/gtf/
> > 	ftp://ftp.ensemblgenomes.org/pub/protists/release-13/gtf/
> > 	ftp://ftp.ensemblgenomes.org/pub/plants/release-13/gtf/
> >
> >
> > Supporting Ensembl GTF would make it easy to distribute/archive the
> elements of a transcriptome analysis alongside a project/analysis in a
> generally useful format (i.e. IGV and other tools can work with it more or less
> directly)
> 
> In my package easyRNASeq, I already load Ensembl GTF files and convert
> them into GRanges / RangedData object. It's pretty straightforward. I guess
> that adapting the code to create a transcriptDb should be do-able.
> 
> >
> > Related note, I have learned that the BioMarts produced for
> EnsemblGenome's are NOT ARCHIVED, whereas it seems that historic GTF IS
> available.  Upshot: you'd best not depend upon being able to reproduce
> today's TranscriptDbFromBiomart  tomorrow.
> 
> I don't know where you learned that and how you meant it exactly, but using
> biomaRt, you can still access Ensembl version as old as of march 2009:  see
> http://mar2009.archive.ensembl.org/index.html. 

I learned it via an email exchange with Ensembl Genomes support

	Hello Malcolm,
	No, I am afraid that for Ensembl Genomes we don't make older versions available through an Archive! site, like we do for Ensembl.
	-- 
	With kind regards,
	Bert Overduin, Ph.D.
	(Ensembl Helpdesk)

I realize this refers to the Ensembl Genomes web site, not the BioMart per se, however I'm pretty sure it extends.

Note, EnsemblGenomes sites do NOT have the same archive policy as the main Ensembl site.

I would like to be able to more clearly refer to this distinction via an on-line policy document, or some such, and would welcome a reference if there is one to be had.....

>It's not straightforward to
> figure it out, but on the main Ensembl webpage, you can get the full list by
> clicking the "view in archive site" link at the bottom left of the papge. It
> redirects to this URL: http://www.ensembl.org/Help/ArchiveList.
> Then, to use biomaRt on a given archive, you need to change the host
> argument of useMart to the URL of the corresponding Ensembl archive as in:
> useMart("ENSEMBL_MART_ENSEMBL",host="mar2009.archive.ensembl.org"
> ). I recon that the biomaRT archive arguments does not work for that. I need
> to post something about this on the mailing list.



> 
> >
> > re: "typical gff3 files"...
> > Flybase makes gff3 extracts and if my understanding is correct, have been
> diligent in "getting it right"
> 
> I believe so too. Again, in easyRNASeq, I do parse Flybase gff3 files and
> convert them to GRanges/RangedData object, but all the merit goes to the
> readGff3 function from the genomeIntervals package. Reading a gff3 file
> with this function is extremely quick as is accessing the gffAttributes
> (performed at the C layer) .
> 
> Cheers,
> 
> Nico
> 
> >
> > Also, NCBI historically has tried to provide GFFx extracts, with oodles of
> caveats.
> > But, but, Last month they announced progress on improving their GFF3
> offerings:  http://bio.perl.org/pipermail/bioperl-l/2012-March/036387.html
> > Example: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/
> > YMMV.
> >
> > I too once hoped to find makeTranscriptDbFromGFF3 capability so as to
> allow easy tracking the head of Flybase's offerings, i.e.
> ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r5.44_FB201
> 2_02/gff/ - alas I too have not followed up.
> >
> > ~Malcolm
> >
> >
> >> -----Original Message-----
> >> From: bioconductor-bounces at r-project.org [mailto:bioconductor-
> >> bounces at r-project.org] On Behalf Of Marc Carlson
> >> Sent: Wednesday, April 04, 2012 7:44 PM
> >> To: bioconductor at r-project.org
> >> Subject: Re: [BioC] Create transcriptDb using gff3 files? - library
> >> GenomicFeatures and rtracklayer
> >>
> >> I was looking at this during the course, and this is on my TODO list for
> >> the next release cycle.  I think it is long overdue and I don't think
> >> that the community is going to get it done in spite of all the
> >> enthusiasm.  There has not been time to do it before now but I am hoping
> >> that will now change.  It should be simple enough in principle, but it
> >> might not be exactly trivial as I have discovered (on closer inspection)
> >> that the gff specification is not as concrete as one would like it to
> >> be.  Also there have been several different versions.
> >>
> >> Some things that can help speed me along:
> >>
> >> 1) which version is most important?  gff3?  Or one of the other
> >> versions?  It is likely that with the older versions we may not be able
> >> to extract as much meaningful information.
> >>
> >>  2) where is the best place to find some typical gff3 files for
> >> examples?  This should not be difficult, but when I was looking before I
> >> was finding that people were surprisingly stingy about sharing these.
> >>
> >>
> >>   Marc
> >>
> >>
> >>
> >> On 04/03/2012 03:57 PM, Michael Lawrence wrote:
> >>> Marc was working on this during the course in Feb. Not sure what
> >> happened
> >>> to it. He said it was simple. Maybe just waiting for the release to pass.
> >>>
> >>> Michael
> >>>
> >>> On Tue, Apr 3, 2012 at 3:40 PM, Steve Lianoglou<
> >>> mailinglist.honeypot at gmail.com>  wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> On Tue, Apr 3, 2012 at 4:41 PM, Sang Chul Choi<schoi at cornell.edu>
> >> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I am wondering if I could create a TranscriptDb object (library
> >>>> GenomicFeatures) using a gff3 file.  I could read a gff3 file using
> >>>> import.gff3, but I could not find a way to create TranscriptDb object
> from
> >>>> the object from import.gff3.
> >>>>> Two arguments for makeTranscriptDb are required: transcripts,
> splicings.
> >>>> It does not seem to be easy to parse this information from the object
> >> form
> >>>> import.gff3.  I will appreciate any help.
> >>>>
> >>>> As far as I know, this functionality isn't there yet ...
> >>>>
> >>>> I once (early feb, 2012) suggested I might take a crack at making this
> >>>> happen but haven't actually found the time to do it ... I'm not sure
> >>>> anyone in bioc-core land (hi, Marc) has found the time to do it
> >>>> either, so I think you're out of luck.
> >>>>
> >>>> Sorry for that. But the good news is that I bet a patch that does this
> >>>> would be welcome ;-)
> >>>>
> >>>> -steve
> >>>>
> >>>> --
> >>>> Steve Lianoglou
> >>>> Graduate Student: Computational Systems Biology
> >>>>  | Memorial Sloan-Kettering Cancer Center
> >>>>  | Weill Medical College of Cornell University
> >>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
> >>>>
> >>>> _______________________________________________
> >>>> Bioconductor mailing list
> >>>> Bioconductor at r-project.org
> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>> Search the archives:
> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>>>
> >>> 	[[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor at r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list