[R] trouble with read.table and colClasses='raw'

jim holtman jholtman at gmail.com
Fri Feb 12 01:33:58 CET 2010


What you might consider is to use save/load for storing the data in a
format that is easily accessible in R, and then using write.table for
creating a character based output for other external programs.  For
the size files you are working with, this is the easiest and fastest
way of doing it.

On Thu, Feb 11, 2010 at 4:08 PM, Johan Jackson
<johan.h.jackson at gmail.com> wrote:
> Apologies for my sarcastic/defensive reply email Peter.
>
> The issue is that I need this matrix to be read into other programs - not
> just R, so save() won't work. I like 'raw' mode because it saves so much
> space, but it's difficult to work with. This read/write issue is but one
> example; another is that R will try to convert the raw matrix to, e.g.,
> double, if you forget and assign any element of it to be double (personally,
> I'd prefer there to be an option, set in options(), for R to downcast the
> variable to raw and give you a warning).
>
> Anyway, I've been working with R a bit, but I've come to the conclusion that
> it is just not user-friendly when it comes to large datasets. I've tried
> some of the large data packages but at least all that I've tried have their
> own sets of issues. As much as it pains me to say it, I may go back to SAS
> when working on such projects...
>
> Best,
>
> JJ
>
>
>
>
>
> On Thu, Feb 11, 2010 at 1:19 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
>
>> Johan,
>>
>> My apologies if you took my comments to be sarcastic; they were
>> certainly not meant to be. I have no desire to put you or anyone
>> down.
>>
>> I see now that you want to somehow store data more 'efficiently',
>> presumably in order to be able to handle larger objects in RAM.
>>
>> I doubt that storage.mode raw will help. Your post implied that
>> you had saved an object and couldn't read it back into the same
>> format in which you think it was saved. So, did you have 16Gb
>> object to save? And why wouldn't you use save()? It's just a
>> guess, but I think you may have a file of _character_ data that
>> you want to read into R where its storage mode should be 'raw'.
>> I don't know how to do that.
>>
>> If the main purpose is to circumvent R's memory requirements,
>> then there have been plenty of posts on that issue.
>>
>>  -Peter Ehlers
>>
>>
>> Johan Jackson wrote:
>>
>>> "I suspect that you really don't know what 'raw' type means and haven't
>>> bothered to check ?raw. It's also pretty clear that you haven't read the
>>> colClasses description in ?read.table very carefully."
>>>
>>> Gee, thanks Peter (this is what I love about the R help boards: people
>>> whose
>>> sole goal is to put others down as wittily as possible for asking *stupid
>>> stupid* questions). Gives me warm fuzzies :)
>>>
>>> Although I admit to not being the brightest of folks around, or knowing R
>>> backwards and forwards, I did read ?read.table and ?raw. But your
>>> suggestion
>>> is not at all helpful Peter:
>>>
>>> dat <- read.table(file="data", header=TRUE, colClasses="character") #wow!
>>> it
>>> works on a 5x3 matrix! amazing!! (sarcasm)
>>>
>>> dat2 <- as.matrix(dat)
>>> storage.mode(dat2) <- 'raw'
>>>
>>> if I had wanted 'character' data, I would have put that into my question.
>>> Any newbie can do what you did; the issue is that object.size(dat) is
>>> about
>>> 8 times larger than object.size(dat2) with any large dataset. That's why I
>>> want to store it as 'raw' - because the raw one takes about 2 Gb RAM and
>>> the
>>> other about 16Gb! Perhaps you need to understand the raw mode a bit
>>> better,
>>> Peter, because I thought the reason for wanting the data in 'raw' was
>>> quite
>>> obvious, but I guess not.
>>>
>>> Peter, here's what I want you to do. Use R to make a vector with 2^31 - 5
>>> elements in it. Hey, make it of mode 'character' while you're at it! Write
>>> it out. Read it back in. Having problems? Then come talk to me...
>>>
>>> JJ
>>>
>>>  [....]
>>
>> --
>> Peter Ehlers
>> University of Calgary
>>
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list