[Bioc-devel] Patch to GFF3 reader in rtracklayer

Ryan C. Thompson rct at thompsonclan.org
Mon Aug 27 18:32:53 CEST 2012


It looks like the attachment was scrubbed from my initial message. Here 
is another attempt to send it.

On Mon 27 Aug 2012 08:50:03 AM PDT, Ryan C. Thompson wrote:
> Hi all,
>
> I recently found that rtracklayer's GFF3 file read was unable to read
> GFF3 files produced by Cufflinks. I tracked the problem down to the
> occurrence of equals signs in tag values. For example, the following
> line was problematic:
>
> C123300344      Cufflinks       transcript      1       132     . -
> .
> ID=TCONS_00000337;geneID=XLOC_000337;oId=ENSMMUP00000032229;nearest_ref=ENSMMUP00000032229;class_code==;tss_id=TSS337;p_id=P1
>
>
> due to the "class_code==" part (the value of the class code is
> actually an equals sign). Obviously the bug occurs because "strsplit"
> doesn't stop after the first split, but keeps splitting at subsequent
> occurrences of the separator. I have modified the reader to be able to
> handle this case, which as far as I know is perfectly valid. Instead
> of strsplit, I use regexpr to find only the *first* occurrence of an
> equals sign, and then I use substr to extract the part of the tag
> before and after the equals sign. The attached file is a patch against
> "R/gff.R" in the rtracklayer dist. I developed the patch against
> version 1.16.1.
>
> Regards,
>
> -Ryan Thompson
>
>


More information about the Bioc-devel mailing list