[BioC] Fastest way to read CSV files
Stijn van Dongen
stijn at ebi.ac.uk
Fri Aug 20 15:36:36 CEST 2010
sorry, this:
> <----integer 1 ---> <--- integer 2 --->
> 0000000 0014 0000 4268 0000 0000 0000 c000 4070
should have been:
<-int 1-> <-int 2->
0000000 0014 0000 4268 0000 0000 0000 c000 4070
> Thanks Misha, that's very instructive.
> I'd like to add that this can be made quite parametrizable, in that it is
> possible to write and read the dimensions of the object as well. In fact, by
> writing some kind of 'cookie' number it would be possible to have code that can
> recognize what *type* of data it needs to read. In the example below however,
> just the dimensions are first written to and then read from file. When reading,
> the dimensions are no longer hardcoded, but read from the same connection.
>
> x <- matrix(floor(runif(1.7e4 * 20)*1000),nr=20)
> cn <- file("test.bin","wb")
> writeBin(dim(x), cn)
> writeBin(as.vector(x), cn)
> close(cn)
>
> cn <- file("test.bin", "rb")
> dims <- readBin(cn, integer(), 2)
> x2 <- matrix(readBin(cn,numeric(), dims[1] * dims[2]), nrow=dims[1], ncol=dims[2])
> close(cn)
>
> sum(x != x2)
>
> a hex dump of the file test.bin gives this for the first line:
>
> <----integer 1 ---> <--- integer 2 --->
> 0000000 0014 0000 4268 0000 0000 0000 c000 4070
>
> indeed, hexadecimal 0x14 == 20 and hexadecimal 4268 == 17000,
> this on a little endian machine.
--
Stijn van Dongen >8< -o) O< forename pronunciation: [Stan]
EMBL-EBI /\\ Tel: +44-(0)1223-492675
Hinxton, Cambridge, CB10 1SD, UK _\_/ http://micans.org/stijn
More information about the Bioconductor
mailing list