[BioC] Reading GFF files into R and GenomeGraphs
Michael Dondrup
Michael.Dondrup at bccs.uib.no
Mon Aug 10 17:41:15 CEST 2009
Hi,
it took a while to figure out how to read genome annotation files with
and without rtracklayer, thanks Michael, and also how to plot
bacterial chromosomes using the GenomeGraphs package. I think, this
information can be useful for others, so I made a tiny howto. I think
some of this could also be added to the documentation examples of the
GenomeGraphs package. I would be glad if someone would tell me if that
fits or if there are some comments. Thank you again for this package.
1) Howto plot genomic annotation from a GFF file using rtracklayer:
require("GenomeGraphs")
require("rtracklayer")
# read in the gff, example from import.gff:
# import a GFF V2 file
gff <- import.gff(system.file("tests", "v2.gff", package =
"rtracklayer"), version = "2")
# the gff object contains an IRanges list with the intervals
# we need a function to convert to an AnnotationTrack from an IRanges
object:
makeAnnotationTrackFromIRanges = function (iranges,
dp=DisplayPars(ranges = "yellow", plotId=TRUE )) {
iranges.names = if (is.null(names(iranges))) {1:length(iranges)}
else {names(iranges)} # make some IDs
annotation = data.frame(start=iranges at start,
end=iranges at start+iranges@width-1,
feature=paste("ranges"), # there is
no more feature information here
group=c(1:length(iranges)), # put
every region in a different group
ID=iranges.names)
makeAnnotationTrack(regions=annotation, dp=dp)
}
aTrack = makeAnnotationTrackFromIRanges(gff at ranges[[1]]) # there is a
RangesList inside the gff
gdPlot(aTrack, minBase=1, maxBase=11000)
2) Howto plot arbitrary chromosome data contained in a data.frame:
I wasn't fully content with the output from the import.gff function, I
have additional information like gene name & reading frame which I
didn't get this way. Assume there is a data.frame "cds" containing
annotation with columns containing at least columns:
"GeneID", "Start", "Stop"
Then, the following function can make an AnnotationTrack from it:
makeSingleAnnotationTrack = function (cds, dp=DisplayPars(orf="blue",
plotId=T)) {
annotation = data.frame(start=cds$Start, end=cds$Stop,
feature=rep.int("orf", nrow(cds)), # this can also be taken from some
annotation field
group=c(1:nrow(cds)), ID=cds$GeneID )
makeAnnotationTrack(annotation, dp=dp)
}
3) Add coverage data to the annotation data:
# convert an iranges object into a coverage track
# I think this is a bit inefficient for large iranges objects, is
there a better way?
makeBaseTrackCoverageFromIRanges = function(iranges, ...) {
baseCoverage = as.numeric(coverage(iranges))
makeBaseTrack(base=1:length(baseCoverage), value=baseCoverage, ...)
}
# and then plot this
x = IRanges(start=c(2, 0, NA), end=c(NA, NA, 14), width=11:0)
plotlist = list(makeAnnotationTrackFromIRanges(x),
makeBaseTrackCoverageFromIRanges(x))
gdPlot(plotlist,min=1, max=20)
This is maybe not very nice, but hope this helps.....
best
Michael
Am 30.07.2009 um 13:24 schrieb Michael Lawrence:
> On Thu, Jul 30, 2009 at 12:52 AM, Michael Dondrup <
> Michael.Dondrup at bccs.uib.no> wrote:
>
>> Dear BioC list,
>>
>> I'm trying to use the package GenomeGraphs to visualise custom
>> genome data
>> (genome not in public databases). In the corresponding publication to
>> genomegraphs (Durinck et al. BMC Bioinformatics, 2009 ) I found the
>> partial
>> clue: "... genomic annotation encoded in GFF files can be easily
>> used to
>> create a custom AnnotationTrack object for visualization... region
>> start,
>> and end positions need to be given, as well as how these regions
>> are to be
>> grouped."
>> This leaves me with two questions:
>> - I have no idea how to parse a GFF file, is there a GFF parser in
>> Bioconductor?
>>
>
> Yes, rtracklayer has a parser. Please see the vignette.
>
>
>>
>> - If I have such GFF file for a genome, how can I create such an
>> AnnotationTrack with all CDS?
>>
>> Hope somebody can help me with one of this.
>>
>> Best
>> Michael
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list