[Bioc-sig-seq] GenomicFeatures package

Martin Morgan mtmorgan at fhcrc.org
Wed Oct 28 20:44:14 CET 2009


Michael Lawrence <mflawren at fhcrc.org> writes:

> On Tue, Oct 27, 2009 at 3:53 PM, pterry at huskers.unl.edu <
> pterry at huskers.unl.edu> wrote:
>
>> Dear bioc-sig-sequencing,
>>
>> Previously, I tried the following with a UCSC available genome.
>>
>> genetable<-read.table("celegans_chrIII.txt", header=T, sep="\t")
>> >
>> promoter<-IRanges(start=genetable$txStart-1000*as.real(genetable$strand=="+"),
>> width=1000)
>>
>> It was suggested I might "check out the GenomicFeatures package, which has
>> utilities for working with a data.frame representation of the UCSC genes
>> table. For example, the 'transcripts' function will give you a set of
>> regions, including the promoters you're trying to generate."
>>
>> I have a genome, arabidopsis, apparently not available at the UCSC
>> database, but rather from TAIR.
>>
>> For this genome, might the GenomicFeatures pakage be similarly helpful?  I
>> assume one might start with a file like TAIR9_GFF3_genes.gff from the TAIR
>> site?  I note it has records for 'gene', 'mRNA', 'CDS', 'exon', perhaps
>> others?
>>
>>
> It could be useful, but you'll need to get it into the shape expected by the
> GenomicFeatures functions. With specific regard to transcripts(), it will
> need a 'chrom' column with the chromosome names, 'name' column for the gene
> name, and 'txStart' and 'txEnd' for the start and end of the transcripts,
> using UCSC coordinate conventions.
>
> I'm now thinking that it would be more convenient for the user if there was
> a transcripts method on a RangesList object, which could provide the
> necessary information. This could be extracted from the RangedData, which
> rtracklayer can create from your GFF file, with one caveat: if the GFF file
> contains a mixture of features (genes, exons, etc) and relies on the
> hierarchical features of GFF, it will take more work to get things into the
> right shape.
>
> The question then is where would this new functionality belong? The future
> of the GenomicFeatures package was a bit uncertain, the last time I checked.

The intention is that GenomicFeatures will mature to contain these
sorts of data structures and functions, so that it's easy to create or
retrieve transcript or exon information.

Martin

> Michael
>
> Thanks,
>> P. Terry
>> pterry at huskers.unl.edu
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list