[R] Accelerating binRead

jim holtman jholtman at gmail.com
Sat Sep 17 20:24:47 CEST 2016


I would also suggest that you take a look at the 'pack' package which can
convert the binary input to the value you want.  Part of your performance
problems might be all the short reads that you are doing.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenismail at gmail.com>
wrote:

> I noticed same issue but didnt care much :)
>
> On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com> wrote:
>
>> Your example was not reproducible.  Also how do you "break" out of the
>> "while" loop?
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr>
>> wrote:
>>
>> > Hello,
>> > the following function, which stores numeric values extracted from a
>> > binary file, into an R matrix, is very slow, especially when the said
>> file
>> > is several MB in size.
>> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
>> > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
>> > newbie)?
>> > Many thanks.
>> > Best regards,
>> > phiroc
>> >
>> >
>> > -------------
>> >
>> > # inputPath is something like http://myintranet/getData?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin>
>> >
>> > PLTreader <- function(inputPath){
>> >         URL <- file(inputPath, "rb")
>> >         PLT <- matrix(nrow=0, ncol=6)
>> >         compteurDePrints = 0
>> >         compteurDeLignes <- 0
>> >         maxiPrints = 5
>> >         displayData <- FALSE
>> >         while (TRUE) {
>> >                 periodIndex <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> >                 eventId <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> >                 dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> >                 dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> >                 if (dword1 < 0) {
>> >                         dword1 = dword1 + 2^32-1;
>> >                 }
>> >                 eventDate = (dword2*2^32 + dword1)/1000
>> >                 repNum <- readBin(URL, integer(), size=2, n=1,
>> > endian="little") # short (2 bytes)
>> >                 exp <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes, strangely enough, would expect 8)
>> >                 loss <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes)
>> >                 PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
>> > repNum, exp, loss))
>> >         } # end while
>> >         return(PLT)
>> >         close(URL)
>> > }
>> >
>> > ----------------
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list