[Rd] how to manipulate dput output format

Simon Urbanek simon.urbanek at r-project.org
Mon Jun 25 19:08:58 CEST 2012


On Jun 25, 2012, at 11:57 AM, andre zege wrote:

> 
> 
> On Mon, Jun 25, 2012 at 11:17 AM, Simon Urbanek <simon.urbanek at r-project.org> wrote:
> 
> On Jun 25, 2012, at 10:20 AM, andre zege wrote:
> 
> > dput() is intended to be parsed by R so the above is not possible without massaging the output. But why in the would would you use dput() for something that you want to read in Java? Why don't you use a format that Java can read easily - such as JSON?
> >
> > Cheers,
> > Simon
> >
> >
> >
> >
> >
> > Yeap, except i was just working with someone elses choice. Bigmatrix code uses dput() to dump desc file of filebacked matrices.
> 
> Ah, ok, that is indeed rather annoying as it's pretty much the most non-portable storage (across programs) one could come up with. (I presume you're talking about big.matrix from bigmemory?)
> 
> 
> > I got some time to do a little hack of reading big matrices nicely to java and was looking to some ways of smoothing the edges of parsing .desc file a little. I guess i am ok  now with parsing .desc with some regex. One thing i am still wondering about is whether i really need to convert back and forth between liitle endian and big endian. Namely, java platform has little endian native byte order, and big matrix code writes stuff in big endian. It'd be nice if i could manipulate that by some #define somewhere in the makefile or something and make C++ write little endian without byte swapping every time i need to communicate with big matrix from java.
> 
> I think you're wrong (if we are talking about bigmemory) - the endianness is governed by the platform as far as I can see. On little-endian machines the big matrix storage is little endian and on big-endian machines it is big-endian.
> 
> It's very peculiar that the descriptor doesn't even store the endianness - I think you could talk to the authors and suggest that they include most basic information such as endianness and, possibly, change the format to something that is well-defined without having to evaluate it in R (which is highly dangerous and a serious security risk).
> 
> Cheers,
> Simon
> 
> 
> 
> I would assume that hardware should dictate endianness, just like you said. However, the fact is that bigmemory writes in different endianness than java reads in. I simply compare matrices that i write using bigmemory and that I read into java. Unless i transform endianness, i get gargabe, and if i swap byte order, i get the same matrix as the one i wrote. So, i don't think i am wrong about that, but i am curious about why it happens and whether it is possible to let bigmemory code write in natural endianness. Then i would not need to transform each double array element back and forth. 
>  

I think it has to do with the way you read it in Java since Java supports either endianness directly. What methods do you use exactly to read it? The on-disk storage is definitely native-endian so C/C++/... can simply mmap it with no swapping.

Cheers,
Simon



More information about the R-devel mailing list