[R] Accelerating binRead

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sat Sep 17 16:51:31 CEST 2016


Appending to lists is only very slightly more efficient than incremental rbinding. Ideally you can figure out an upper bound for number of records, preallocate a data frame of that size, modify each element as you go in-place, and shrink the data frame once at the end as needed. If you cannot do that, you can append fixed size data frames and follow the same strategy in chunks with a single do.call/rbind at the end. 

Note that reproducible examples including example data often yield working code, while incomplete examples tend to yield handwaving descriptions like the above. 

I will note that any code placed after a return function is useless. I highly recommend avoiding the return function like the plague... use the expression-at-the-end-of-the-function method of returning.
-- 
Sent from my phone. Please excuse my brevity.

On September 17, 2016 7:10:05 AM PDT, Ismail SEZEN <sezenismail at gmail.com> wrote:
>I suspect that rbind is responsible. Use list and append instead of
>rbind. At the end, combine elements of list by do.call(“rbind”, list).
>
>> On 17 Sep 2016, at 15:05, Philippe de Rochambeau <phiroc at free.fr>
>wrote:
>> 
>> Hello,
>> the following function, which stores numeric values extracted from a
>binary file, into an R matrix, is very slow, especially when the said
>file is several MB in size.
>> Should I rewrite the function in inline C or in C/C++ using Rcpp? If
>the latter case is true, how do you « readBin »  in Rcpp (I’m a total
>Rcpp newbie)?
>> Many thanks.
>> Best regards,
>> phiroc
>> 
>> 
>> -------------
>> 
>> # inputPath is something like
>http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin
><http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin>
>> 
>> PLTreader <- function(inputPath){
>> 	URL <- file(inputPath, "rb")
>> 	PLT <- matrix(nrow=0, ncol=6)
>> 	compteurDePrints = 0
>> 	compteurDeLignes <- 0
>> 	maxiPrints = 5
>> 	displayData <- FALSE
>> 	while (TRUE) {
>> 		periodIndex <- readBin(URL, integer(), size=4, n=1,
>endian="little") # int (4 bytes)
>> 		eventId <- readBin(URL, integer(), size=4, n=1, endian="little") #
>int (4 bytes)
>> 		dword1 <- readBin(URL, integer(), size=4, signed=FALSE, n=1,
>endian="little") # int
>> 		dword2 <- readBin(URL, integer(), size=4, signed=FALSE, n=1,
>endian="little") # int
>> 		if (dword1 < 0) {
>> 			dword1 = dword1 + 2^32-1;
>> 		}
>> 		eventDate = (dword2*2^32 + dword1)/1000
>> 		repNum <- readBin(URL, integer(), size=2, n=1, endian="little") #
>short (2 bytes)
>> 		exp <- readBin(URL, numeric(), size=4, n=1, endian="little") #
>float (4 bytes, strangely enough, would expect 8)
>> 		loss <- readBin(URL, numeric(), size=4, n=1, endian="little") #
>float (4 bytes)
>> 		PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, repNum, exp,
>loss))
>> 	} # end while
>> 	return(PLT)
>> 	close(URL)
>> }
>> 
>> ----------------
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list