[BioC] Curious file size issues

Adaikalavan Ramasamy a.ramasamy at imperial.ac.uk
Thu Mar 12 11:39:59 CET 2009


I am not an expert in R data representations. However, my experience 
suggests that if an object is stored incorrectly as matrix instead of 
data.frame, then the object sizes may be bloated. Also if it is a 
data.frame, check that each column is stored correctly - via 
matrix(obj). E.g. storing numeric columns as factors or characters etc.

Also use the compress=TRUE option in the save().

Regards, Adai



Daniel Brewer wrote:
> Hello,
> 
> The GTF file from Ensembl for the human genome,
> Homo_sapiens.NCBI36.52.gtf, is 194M and is a tab-delimted text file.  I
> import it into R and process it so that there are two objects:
> genomeRanges & genomeBlocks.  genomeRanges is a list of IRanges objects,
> each of which is a particular chromosome and strand.  genomeBlocks is a
> list of dataframes with the associated annotation for each of the
> transcripts.
> 
> When I save this to file
> (save(genomeBlocks,genomeRanges,file="Hsgenome.Rdata")) it comes out as
> 859M.  How is this possible? Especially as the Rdata file is a binary
> format.
> 
>> object.size(genomeBlocks)
> [1] 2939935864
> 
>> object.size(genomeRanges)
> [1] 8769208
> 
> Anyway got any ideas what is going on?
> 
> Thanks
> 
> Dan
> 
>



More information about the Bioconductor mailing list