[BioC] Curious file size issues
Daniel Brewer
daniel.brewer at icr.ac.uk
Thu Mar 12 16:56:47 CET 2009
Thats great, thank you so much. There was a particular variable that
had long strings that was being treated as a factor which caused the
problems. It is now down to 13M without compression. That's more like it.
Thanks
Dan
Adaikalavan Ramasamy wrote:
> I am not an expert in R data representations. However, my experience
> suggests that if an object is stored incorrectly as matrix instead of
> data.frame, then the object sizes may be bloated. Also if it is a
> data.frame, check that each column is stored correctly - via
> matrix(obj). E.g. storing numeric columns as factors or characters etc.
>
> Also use the compress=TRUE option in the save().
>
> Regards, Adai
>
>
>
> Daniel Brewer wrote:
>> Hello,
>>
>> The GTF file from Ensembl for the human genome,
>> Homo_sapiens.NCBI36.52.gtf, is 194M and is a tab-delimted text file. I
>> import it into R and process it so that there are two objects:
>> genomeRanges & genomeBlocks. genomeRanges is a list of IRanges objects,
>> each of which is a particular chromosome and strand. genomeBlocks is a
>> list of dataframes with the associated annotation for each of the
>> transcripts.
>>
>> When I save this to file
>> (save(genomeBlocks,genomeRanges,file="Hsgenome.Rdata")) it comes out as
>> 859M. How is this possible? Especially as the Rdata file is a binary
>> format.
>>
>>> object.size(genomeBlocks)
>> [1] 2939935864
>>
>>> object.size(genomeRanges)
>> [1] 8769208
>>
>> Anyway got any ideas what is going on?
>>
>> Thanks
>>
>> Dan
>>
>>
>
--
**************************************************************
Daniel Brewer, Ph.D.
Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk
**************************************************************
The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
This e-mail message is confidential and for use by the a...{{dropped:2}}
More information about the Bioconductor
mailing list