[Bioc-devel] Patch to GFF3 reader in rtracklayer

Ryan C. Thompson rct at thompsonclan.org
Mon Aug 27 17:50:03 CEST 2012


Hi all,

I recently found that rtracklayer's GFF3 file read was unable to read 
GFF3 files produced by Cufflinks. I tracked the problem down to the 
occurrence of equals signs in tag values. For example, the following 
line was problematic:

C123300344      Cufflinks       transcript      1       132     . - 
   . 
ID=TCONS_00000337;geneID=XLOC_000337;oId=ENSMMUP00000032229;nearest_ref=ENSMMUP00000032229;class_code==;tss_id=TSS337;p_id=P1

due to the "class_code==" part (the value of the class code is actually 
an equals sign). Obviously the bug occurs because "strsplit" doesn't 
stop after the first split, but keeps splitting at subsequent 
occurrences of the separator. I have modified the reader to be able to 
handle this case, which as far as I know is perfectly valid. Instead of 
strsplit, I use regexpr to find only the *first* occurrence of an equals 
sign, and then I use substr to extract the part of the tag before and 
after the equals sign. The attached file is a patch against "R/gff.R" in 
the rtracklayer dist. I developed the patch against version 1.16.1.

Regards,

-Ryan Thompson




More information about the Bioc-devel mailing list