Are you sure this is valid GFF3?

For example, consider this attribute field:

ID=supercont1.1;molecule_type=dsDNA;GenBank:supercontig:AaegL1:supercont1.1:1:5856339:1;translation_table=1;topology=linear;localization=chromosomal;

The string "GenBank:supercontig:AaegL1:supercont1.1:1:5856339:1" is
delimited by semi-colons, but it does not conform to the key=value format.

I will add a check for this to devel, so that the error is more obvious.

You can still read this file if you use the "colnames" argument. If
colnames=character(), you will get just the seqnames, start and end.  If you
just want the strand in addition to that, specify colnames="strand", etc. By
default, all columns (including attributes) are parsed.

Michael

On Fri, Jul 1, 2011 at 11:48 AM, Vince S. Buffalo <vsbuffalo@gmail.com>wrote:

> Hi All,
>
> I have tried to use import.gff3 from the rtracklayer package to import
> annotation information for the mosquito genome (gff3 file here:
> http://aaegypti.vectorbase.org/GetData/Downloads/) which is only 20 MB but
> the memory usage has exceeded 100GB on one of our high memory servers. This
> seems like far too much to just read in the file (which takes only 3
> seconds
> with read.delim) and convert to RangedData objects. Has anyone experienced
> similar problems?
>
> Here is my sessionInfo:
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] biomaRt_2.8.0      rtracklayer_1.12.4 RCurl_1.5-0        bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] Biostrings_2.20.0   BSgenome_1.20.0     GenomicRanges_1.4.1
> [4] IRanges_1.10.0      tools_2.13.0        XML_3.2-0
>
> --
> Vince Buffalo
> Statistical Programmer
> Bioinformatics Core
> UC Davis Genome Center
> University of California, Davis
>
> "There's real poetry in the real world. Science is the poetry of reality."
> -Richard Dawkins
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

