[BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF

Katja Hebestreit katjah at stanford.edu
Mon Apr 14 04:18:43 CEST 2014


Hello,

I get an error when I try to import my gff file:

txdb <- makeTranscriptDbFromGFF(file="file.gtf", format="gtf")

Error in .parse_attrCol(attrCol, file, colnames) : 
  Some attributes do not conform to 'tag value' format

This is how my file looks like:

chr1	mm9_refFlat	stop_codon	3206103	3206105	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	CDS	3206106	3207049	0.000000	-	2	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	exon	3204563	3207049	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	CDS	3411783	3411982	0.000000	-	1	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	exon	3411783	3411982	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	CDS	3660633	3661429	0.000000	-	0	gene_id "Xkr4"; transcript_id "Xkr4"; 

I have the feeling that this has something to do with the missing exon rank information in my file. Is that true? Is there a way to import this file? All I want to do is to determine the gene lengths.

Could anyone help? That would be awesome!
Cheers,
Katja


sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GenomicFeatures_1.16.0 AnnotationDbi_1.25.19  Biobase_2.23.6        
[4] GenomicRanges_1.16.0   GenomeInfoDb_0.99.32   IRanges_1.21.45       
[7] BiocGenerics_0.9.3     BiocInstaller_1.14.1  

loaded via a namespace (and not attached):
 [1] BatchJobs_1.2           BBmisc_1.5              BiocParallel_0.6.0     
 [4] biomaRt_2.20.0          Biostrings_2.32.0       bitops_1.0-6           
 [7] brew_1.0-6              BSgenome_1.32.0         codetools_0.2-8        
[10] DBI_0.2-7               digest_0.6.4            fail_1.2               
[13] foreach_1.4.2           GenomicAlignments_1.0.0 iterators_1.0.7        
[16] plyr_1.8.1              Rcpp_0.11.1             RCurl_1.95-4.1         
[19] Rsamtools_1.16.0        RSQLite_0.11.4          rtracklayer_1.24.0     
[22] sendmailR_1.1-2         stats4_3.1.0            stringr_0.6.2          
[25] tools_3.1.0             XML_3.98-1.1            XVector_0.4.0          
[28] zlibbioc_1.10.0



More information about the Bioconductor mailing list