[Bioc-devel] Mistake in extractTranscriptSeqs() {from: GenomicFeatures} when transcripts are on minus strand

Kristoffer Vitting-Seerup kristoffer.vittingseerup at bio.ku.dk
Tue Aug 26 16:40:12 CEST 2014


The problem is best illustrated with an example (with comments in bold):

# Load libraries:
library('GenomicFeatures')
library("BSgenome.Hsapiens.UCSC.hg19”)

# Create two test transcripts, one on each strand
testTranscriptPlus <- GRanges(seqnames='chr1', ranges=IRanges(start=1e5+c(1,11), end=1e5+c(3,13)), strand='+')
testTranscriptMinus <- GRanges(seqnames='chr1', ranges=IRanges(start=1e5+c(1,11), end=1e5+c(3,13)), strand='-‘)

# Use getSeq() to extract sequence

getSeq(Hsapiens, testTranscriptPlus)
  A DNAStringSet instance of length 2
    width seq
[1]     3 ACT
[2]     3 CAG

> testTranscriptMinus <- GRanges(seqnames='chr1', ranges=IRanges(start=1e5+c(1,11), end=1e5+c(3,13)), strand='-')
> getSeq(Hsapiens, testTranscriptMinus)
  A DNAStringSet instance of length 2
    width seq
[1]     3 AGT
[2]     3 CTG

# By comparing the plus and minus transcripts it can be seen that getSeq returns the sequences in the 5’ to 3’ orientation no matter the strand (very nice feature by the way)
# Furthermore we would expect the transcript sequence of the testTranscriptMinus to be: CTGAGT (since we are on the minus strand).

# Now lets extract the sequences of the minus strand transcript using the extractTranscriptSeqs() function
extractTranscriptSeqs( Hsapiens, split( testTranscriptMinus, f=c('test','test')))
  A DNAStringSet instance of length 1
    width seq                                                                                                                             names               
[1]     6 AGTCTG                                                                                                                          test

# From which it can be observed that extractTranscriptSeqs() have concatenated the exons without taking the order into account (which works fine on plus strand but results in non-exisiting transcript from the minus strand).

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] C

attached base packages:
[1] grDevices datasets  grid      parallel  stats     graphics  utils     methods   base     

other attached packages:
 [1] GenomicFeatures_1.17.14            AnnotationDbi_1.27.9               Biobase_2.25.0                     BSgenome.Hsapiens.UCSC.hg19_1.3.99
 [5] BSgenome_1.33.8                    Biostrings_2.33.13                 XVector_0.5.7                      spliceR_1.7.1                     
 [9] plyr_1.8.1                         RColorBrewer_1.0-5                 VennDiagram_1.6.7                  cummeRbund_2.7.2                  
[13] Gviz_1.9.11                        rtracklayer_1.25.13                GenomicRanges_1.17.28              GenomeInfoDb_1.1.18               
[17] IRanges_1.99.24                    S4Vectors_0.1.2                    fastcluster_1.1.13                 reshape2_1.4                      
[21] ggplot2_1.0.0                      RSQLite_0.11.4                     DBI_0.2-7                          BiocGenerics_0.11.4         


-- 
Kindest regards
Kristoffer Vitting-Seerup, cand.scient. (M.Sc.), 
Ph.D Fellow
Sandelin Group

Bioinformatics Centre | Biotech Research & Innovation Centre (BRIC), Dep. Of Biology
University of Copenhagen
Building 1, 3th floor, office 3 (1-3-03)
Ole Maaløes Vej 5
DK-2200 Copenhagen N
Denmark
http://binf.ku.dk | http://www.bric.ku.dk







	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list