[BioC] Curious file size issues
Adaikalavan Ramasamy
a.ramasamy at imperial.ac.uk
Thu Mar 12 11:39:59 CET 2009
I am not an expert in R data representations. However, my experience
suggests that if an object is stored incorrectly as matrix instead of
data.frame, then the object sizes may be bloated. Also if it is a
data.frame, check that each column is stored correctly - via
matrix(obj). E.g. storing numeric columns as factors or characters etc.
Also use the compress=TRUE option in the save().
Regards, Adai
Daniel Brewer wrote:
> Hello,
>
> The GTF file from Ensembl for the human genome,
> Homo_sapiens.NCBI36.52.gtf, is 194M and is a tab-delimted text file. I
> import it into R and process it so that there are two objects:
> genomeRanges & genomeBlocks. genomeRanges is a list of IRanges objects,
> each of which is a particular chromosome and strand. genomeBlocks is a
> list of dataframes with the associated annotation for each of the
> transcripts.
>
> When I save this to file
> (save(genomeBlocks,genomeRanges,file="Hsgenome.Rdata")) it comes out as
> 859M. How is this possible? Especially as the Rdata file is a binary
> format.
>
>> object.size(genomeBlocks)
> [1] 2939935864
>
>> object.size(genomeRanges)
> [1] 8769208
>
> Anyway got any ideas what is going on?
>
> Thanks
>
> Dan
>
>
More information about the Bioconductor
mailing list