[R] readBin into a data frame
Duncan Murdoch
murdoch.duncan at gmail.com
Thu Aug 1 13:41:26 CEST 2013
On 13-08-01 4:36 AM, Zhang Weiwu wrote:
> Hello. readBin is designed to read a batch of data with the same spec, e.g.
> read 10000 floats into a vector. In practise I read into data frame, not
> vector. For each data frame, I need to read a integer and a float.
>
> for (i in 1:1000) {
> dataframe$int[i] <- readBin(con, integer(), size=2)
> dataframe$float[i] <- readBin(con, numeric(), size=4)
> }
>
> And I need to read 100 such data files, ending up with a for loop in a for
> loop. Something feels wrong here, as it is being said if you use double-FOR
> you are not speaking R.
>
> What is the R way of doing this? I can think of writing the content of the
> loop into a function, and vectorize it -- But, the result would be a list of
> list, not exactly data-frame, and the list grows incrementally, which is
> inefficient, since I know the size of my data frame at the outset. I am a
> new learner, not speaking half of R vocabulary, kindly provide some hint
> please:)
I don't think there are any functions to do this directly. I'd probably
use the loop (since the time to read 1000 entries would be small). If
it was longer, what I might do is to read the file as raw bytes, then
read the integer and float vector from subsets of the bytes.
For example, the following untested code:
rawvec <- readBin(con, "raw")
n <- length(rawvec) / 6
i <- 0:(n-1)
# Using sort here is inefficient, but I'm lazy...
indices <- sort( c(6*i + 1, 6*i + 2) )
con <- rawConnection(rawvec[indices])
int <- readBin(con, "integer", size=2)
close(con)
indices <- sort( c(6*i + 3, 6*i + 4, 6*i + 5, 6*i + 6) )
con <- rawConnection(rawvec[indices])
float <- readBin(con, "numeric", 4)
close(con)
dataframe <- data.frame(int=int, float=float)
The other way to do this is to read the data in a C function, using
.Call or .C to get it into R.
Duncan Murdoch
More information about the R-help
mailing list