[BioC] makeTranscriptDbFromGFF for Unstranded Transcripts
Dario Strbenac
dstr7320 at uni.sydney.edu.au
Thu May 9 03:00:16 CEST 2013
> The same could probably be said of GTF and GFF files, and I wonder if
> storing a set of unstranded mRNAs, exons, CDSs in those files is
> considered valid.
>From the specification, it is.
strand - Valid entries include '+', '-', or '.' (for don't know/don't care).
> Anyway, if we wanted makeTranscriptDbFromGFF() to support such GTF and
> GFF files, we would need to automatically replace all missing strands
> by a + or a -.
It is better if it retains the error result, so there is no ambiguity. Adding a sentence about this to the help file would be useful, since users will expect that it reads in all valid GTF and GFF files.
>
> makeTranscriptDbFromGFF("transcripts.gtf", format = "gtf", exonRankAttributeName = "exon_number")
> Ok, so you've managed to store the exon rank in your file. But that
> means that you must have *implicitly* chosen a strand for your exons
> right?
Cufflinks can infer the strand of the transcript for multi-exon transcripts by looking for the canonical GT-AG splice site in reads mapping across an intron, but not for single exon genes. So, it outputs a strand for some genes and not for others.
More information about the Bioconductor
mailing list