[Bioc-devel] Patch to GFF3 reader in rtracklayer

Dan Tenenbaum dtenenba at fhcrc.org
Mon Aug 27 19:30:37 CEST 2012

On Mon, Aug 27, 2012 at 10:24 AM, Ryan C. Thompson <rct at thompsonclan.org> wrote:
> Ok, I put the patch in a Github gist, since the list seems to not like patch
> as an attachment:
> https://gist.github.com/3490557

I'm trying to ensure that the list supports this type of attachment.
It should accept it this time.

> On 08/27/2012 09:32 AM, Ryan C. Thompson wrote:
>> It looks like the attachment was scrubbed from my initial message. Here is
>> another attempt to send it.
>> On Mon 27 Aug 2012 08:50:03 AM PDT, Ryan C. Thompson wrote:
>>> Hi all,
>>> I recently found that rtracklayer's GFF3 file read was unable to read
>>> GFF3 files produced by Cufflinks. I tracked the problem down to the
>>> occurrence of equals signs in tag values. For example, the following
>>> line was problematic:
>>> C123300344      Cufflinks       transcript      1       132 . -
>>> .
>>> ID=TCONS_00000337;geneID=XLOC_000337;oId=ENSMMUP00000032229;nearest_ref=ENSMMUP00000032229;class_code==;tss_id=TSS337;p_id=P1
>>> due to the "class_code==" part (the value of the class code is
>>> actually an equals sign). Obviously the bug occurs because "strsplit"
>>> doesn't stop after the first split, but keeps splitting at subsequent
>>> occurrences of the separator. I have modified the reader to be able to
>>> handle this case, which as far as I know is perfectly valid. Instead
>>> of strsplit, I use regexpr to find only the *first* occurrence of an
>>> equals sign, and then I use substr to extract the part of the tag
>>> before and after the equals sign. The attached file is a patch against
>>> "R/gff.R" in the rtracklayer dist. I developed the patch against
>>> version 1.16.1.
>>> Regards,
>>> -Ryan Thompson
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

More information about the Bioc-devel mailing list