[BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF
Katja Hebestreit
katjah at stanford.edu
Mon Apr 14 20:24:35 CEST 2014
You can download the file here:
https://www.dropbox.com/s/04nck83jq6r91bc/mm9_test.gtf
Using file I get the error:
txdb <- makeTranscriptDbFromGFF(file="Data/mm9_test.gtf", format="gtf")
Error in .parse_attrCol(attrCol, file, colnames) :
Some attributes do not conform to 'tag value' format
Thank you so much for helping!!
Katja
----- Original Message -----
From: "Michael Lawrence" <lawrence.michael at gene.com>
To: "Katja Hebestreit" <katjah at stanford.edu>
Cc: "Michael Lawrence" <lawrence.michael at gene.com>, bioconductor at r-project.org, "Rsamtools Maintainer" <maintainer at bioconductor.org>
Sent: Monday, April 14, 2014 7:27:26 AM
Subject: Re: [BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF
Well, I copied the text and replaced the spaces with tabs as appropriate
and everything seemed to work fine, so you might to attach that fragment of
the file, just to be sure it isn't a formatting issue.
Does import("file.gtf") work for you? If so, that should be good enough for
your use case.
Michael
On Sun, Apr 13, 2014 at 10:14 PM, Katja Hebestreit <katjah at stanford.edu>wrote:
> Actually, the error was not reproducible with the lines I attached. But it
> is reproducible with those lines (four additional lines):
>
> chr1 mm9_refFlat stop_codon 3206103 3206105 0.000000 -
> . gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat CDS 3206106 3207049 0.000000 - 2
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat exon 3204563 3207049 0.000000 - .
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat CDS 3411783 3411982 0.000000 - 1
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat exon 3411783 3411982 0.000000 - .
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat CDS 3660633 3661429 0.000000 - 0
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat start_codon 3661427 3661429 0.000000 -
> . gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat exon 3660633 3661579 0.000000 - .
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat stop_codon 4283062 4283064 0.000000 -
> . gene_id "Rp1"; transcript_id "Rp1";
> chr1 mm9_refFlat CDS 4283065 4283093 0.000000 - 2
> gene_id "Rp1"; transcript_id "Rp1";
>
> Let me know if you like to get the entire file.
>
> Thank you!!
> Katja
>
> ----- Original Message -----
> From: "Michael Lawrence" <lawrence.michael at gene.com>
> To: "Katja Hebestreit" <katjah at stanford.edu>
> Cc: bioconductor at r-project.org, "Rsamtools Maintainer" <
> maintainer at bioconductor.org>
> Sent: Sunday, April 13, 2014 10:02:13 PM
> Subject: Re: [BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF
>
> On Sun, Apr 13, 2014 at 7:18 PM, Katja Hebestreit <katjah at stanford.edu
> >wrote:
>
> > Hello,
> >
> > I get an error when I try to import my gff file:
> >
> > txdb <- makeTranscriptDbFromGFF(file="file.gtf", format="gtf")
> >
> > Error in .parse_attrCol(attrCol, file, colnames) :
> > Some attributes do not conform to 'tag value' format
> >
> > This is how my file looks like:
> >
> > chr1 mm9_refFlat stop_codon 3206103 3206105 0.000000 -
> > . gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1 mm9_refFlat CDS 3206106 3207049 0.000000 - 2
> > gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1 mm9_refFlat exon 3204563 3207049 0.000000 - .
> > gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1 mm9_refFlat CDS 3411783 3411982 0.000000 - 1
> > gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1 mm9_refFlat exon 3411783 3411982 0.000000 - .
> > gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1 mm9_refFlat CDS 3660633 3661429 0.000000 - 0
> > gene_id "Xkr4"; transcript_id "Xkr4";
> >
> > I have the feeling that this has something to do with the missing exon
> > rank information in my file. Is that true? Is there a way to import this
> > file? All I want to do is to determine the gene lengths.
> >
>
> It is most likely as the error says: some of your attributes are malformed.
> Is that the entire file listed above, or is there more? If you could get me
> the file somehow I could diagnose the issue.
>
>
> >
> > Could anyone help? That would be awesome!
> > Cheers,
> > Katja
> >
> >
> > sessionInfo()
> > R version 3.1.0 (2014-04-10)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> > [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
> > [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
> > [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
> > [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel stats graphics grDevices utils datasets methods
> > [8] base
> >
> > other attached packages:
> > [1] GenomicFeatures_1.16.0 AnnotationDbi_1.25.19 Biobase_2.23.6
> > [4] GenomicRanges_1.16.0 GenomeInfoDb_0.99.32 IRanges_1.21.45
> > [7] BiocGenerics_0.9.3 BiocInstaller_1.14.1
> >
> > loaded via a namespace (and not attached):
> > [1] BatchJobs_1.2 BBmisc_1.5 BiocParallel_0.6.0
> > [4] biomaRt_2.20.0 Biostrings_2.32.0 bitops_1.0-6
> > [7] brew_1.0-6 BSgenome_1.32.0 codetools_0.2-8
> > [10] DBI_0.2-7 digest_0.6.4 fail_1.2
> > [13] foreach_1.4.2 GenomicAlignments_1.0.0 iterators_1.0.7
> > [16] plyr_1.8.1 Rcpp_0.11.1 RCurl_1.95-4.1
> > [19] Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0
> > [22] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2
> > [25] tools_3.1.0 XML_3.98-1.1 XVector_0.4.0
> > [28] zlibbioc_1.10.0
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
More information about the Bioconductor
mailing list