[Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
David Terk
david.terk at gmail.com
Mon Jul 23 15:41:16 CEST 2012
I'm attaching a runnable script and corresponding data files. This will
freeze at 83%.
I'm not sure how much simpler to get than this.
-----Original Message-----
From: Joshua Ulrich [mailto:josh.m.ulrich at gmail.com]
Sent: Monday, July 23, 2012 9:17 AM
To: David Terk
Cc: Duncan Murdoch; r-devel at r-project.org
Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
in R 2.15.1 64-bit Ubuntu
Well, you still haven't convinced anyone but yourself that it's definitely
an xts problem, since you have not provided any reproducible example...
--
Joshua Ulrich | about.me/joshuaulrich
FOSS Trading | www.fosstrading.com
On Mon, Jul 23, 2012 at 8:14 AM, David Terk <david.terk at gmail.com> wrote:
> Where should this be discussed since it is definitely XTS related? I
> will gladly upload the simplified script + data files to whoever is
> maintaining this part of the code. Fortunately there is a workaround
here.
>
> -----Original Message-----
> From: Joshua Ulrich [mailto:josh.m.ulrich at gmail.com]
> Sent: Monday, July 23, 2012 8:15 AM
> To: David Terk
> Cc: Duncan Murdoch; r-devel at r-project.org
> Subject: Re: [Rd] Reading many large files causes R to crash -
> Possible Bug in R 2.15.1 64-bit Ubuntu
>
> David,
>
> You still haven't provided a reproducible example. As Duncan already
> said, "if you don't post code that allows us to reproduce the crash,
> it's really unlikely that we'll be able to fix it."
>
> And R-devel is not the appropriate venue to discuss this if it's truly
> an issue with xts/zoo.
>
> Best,
> --
> Joshua Ulrich | about.me/joshuaulrich FOSS Trading |
> www.fosstrading.com
>
>
> On Mon, Jul 23, 2012 at 12:41 AM, David Terk <david.terk at gmail.com> wrote:
>> Looks like the call to:
>>
>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>>
>> If what is causing the issue. If variable name is not set, or set to
>> any value other than NULL. Than no hang occurs.
>>
>> -----Original Message-----
>> From: David Terk [mailto:david.terk at gmail.com]
>> Sent: Monday, July 23, 2012 1:25 AM
>> To: 'Duncan Murdoch'
>> Cc: 'r-devel at r-project.org'
>> Subject: RE: [Rd] Reading many large files causes R to crash -
>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>
>> I've isolated the bug. When the seg fault was produced there was an
>> error that memory had not been mapped. Here is the odd part of the
>> bug. If you comment out certain code and get a full run than comment
>> in
> the code which
>> is causing the problem it will actually run. So I think it is safe to
>> assume something wrong is taking place with memory allocation. Example.
>> While testing, I have been able to get to a point where the code will
run.
>> But if I reboot the machine and try again, the code will not run.
>>
>> The bug itself is happening somewhere in XTS or ZOO. I will gladly
>> upload the data files. It is happening on the 10th data file which
>> is only 225k lines in size.
>>
>> Below is the simplified code. The call to either
>>
>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>> index(dat.i) <- index(to.period(templateTimes, period=per, k=subper))
>>
>> is what is causing R to hang or crash. I have been able to replicate
>> this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to
>> consistently replicate from R Studio.
>>
>> The code below will consistently replicate when the appropriate files
>> are used.
>>
>> parseTickDataFromDir = function(tickerDir, per, subper) {
>> tickerAbsFilenames = list.files(tickerDir,full.names=T)
>> tickerNames = list.files(tickerDir,full.names=F)
>> tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
>> pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames),
>> style = 3)
>>
>> for(i in 1:length(tickerAbsFilenames)) {
>> dat.i = parseTickData(tickerAbsFilenames[i])
>> dates <- unique(substr(as.character(index(dat.i)), 1,10))
>> times <- rep("09:30:00", length(dates))
>> openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
>> templateTimes <- NULL
>>
>> for (j in 1:length(openDateTimes)) {
>> if (is.null(templateTimes)) {
>> templateTimes <- openDateTimes[j] + 0:23400
>> } else {
>> templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400)
>> }
>> }
>>
>> templateTimes <- as.xts(templateTimes)
>> dat.i <- merge(dat.i, templateTimes, all=T)
>> if (is.na(dat.i[1])) {
>> dat.i[1] <- -1
>> }
>> dat.i <- na.locf(dat.i)
>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>> index(dat.i) <- index(to.period(templateTimes, period=per,
>> k=subper))
>> setTxtProgressBar(pb, i)
>> }
>> close(pb)
>> }
>>
>> parseTickData <- function(inputFile) {
>> DAT.list <- scan(file=inputFile,
>> sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)
>> index <-
>> as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
>> %H:%M:%S")
>> DAT.xts <- xts(DAT.list$Close,index)
>> DAT.xts <- make.index.unique(DAT.xts)
>> return(DAT.xts)
>> }
>>
>> DATTick <- parseTickDataFromDir(tickerDirSecond, "seconds",10)
>>
>> -----Original Message-----
>> From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
>> Sent: Sunday, July 22, 2012 4:48 PM
>> To: David Terk
>> Cc: r-devel at r-project.org
>> Subject: Re: [Rd] Reading many large files causes R to crash -
>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>
>> On 12-07-22 3:54 PM, David Terk wrote:
>>> I am reading several hundred files. Anywhere from 50k-400k in size.
>>> It appears that when I read these files with R 2.15.1 the process
>>> will hang or seg fault on the scan() call. This does not happen on
>>> R
> 2.14.1.
>>
>> The code below doesn't do anything other than define a couple of
> functions.
>> Please simplify it to code that creates a file (or multiple files),
>> reads it or them, and shows a bug.
>>
>> If you can't do that, then gradually add the rest of the stuff from
>> these functions into the mix until you figure out what is really
>> causing
> the bug.
>>
>> If you don't post code that allows us to reproduce the crash, it's
>> really unlikely that we'll be able to fix it.
>>
>> Duncan Murdoch
>>
>>>
>>>
>>>
>>> This is happening on the precise build of Ubuntu.
>>>
>>>
>>>
>>> I have included everything, but the issue appears to be when
>>> performing the scan in the method parseTickData.
>>>
>>>
>>>
>>> Below is the code. Hopefully this is the right place to post.
>>>
>>>
>>>
>>> parseTickDataFromDir = function(tickerDir, per, subper, fun) {
>>>
>>> tickerAbsFilenames = list.files(tickerDir,full.names=T)
>>>
>>> tickerNames = list.files(tickerDir,full.names=F)
>>>
>>> tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
>>>
>>> pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames),
>>> style = 3)
>>>
>>>
>>>
>>> for(i in 1:length(tickerAbsFilenames)) {
>>>
>>>
>>>
>>> # Grab Raw Tick Data
>>>
>>> dat.i = parseTickData(tickerAbsFilenames[i])
>>>
>>> #Sys.sleep(1)
>>>
>>> # Create Template
>>>
>>> dates <- unique(substr(as.character(index(dat.i)), 1,10))
>>>
>>> times <- rep("09:30:00", length(dates))
>>>
>>> openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
>>>
>>> templateTimes <- NULL
>>>
>>>
>>>
>>> for (j in 1:length(openDateTimes)) {
>>>
>>> if (is.null(templateTimes)) {
>>>
>>> templateTimes <- openDateTimes[j] + 0:23400
>>>
>>> } else {
>>>
>>> templateTimes <- c(templateTimes, openDateTimes[j] +
>>> 0:23400)
>>>
>>> }
>>>
>>> }
>>>
>>>
>>>
>>> # Convert templateTimes to XTS, merge with data and convert
>>> NA's
>>>
>>> templateTimes <- as.xts(templateTimes)
>>>
>>> dat.i <- merge(dat.i, templateTimes, all=T)
>>>
>>> # If there is no data in the first print, we will have leading
>>> NA's. So set them to -1.
>>>
>>> # Since we do not want these values removed by to.period
>>>
>>> if (is.na(dat.i[1])) {
>>>
>>> dat.i[1] <- -1
>>>
>>> }
>>>
>>> # Fix remaining NA's
>>>
>>> dat.i <- na.locf(dat.i)
>>>
>>> # Convert to desired bucket size
>>>
>>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>>>
>>> # Always use templated index, otherwise merge fails with other
>>> symbols
>>>
>>> index(dat.i) <- index(to.period(templateTimes, period=per,
>>> k=subper))
>>>
>>> # If there was missing data at open, set close to NA
>>>
>>> valsToChange <- which(dat.i[,"Open"] == -1)
>>>
>>> if (length(valsToChange) != 0) {
>>>
>>> dat.i[valsToChange, "Close"] <- NA
>>>
>>> }
>>>
>>> if(i == 1) {
>>>
>>> DAT = fun(dat.i)
>>>
>>> } else {
>>>
>>> DAT = merge(DAT,fun(dat.i))
>>>
>>> }
>>>
>>> setTxtProgressBar(pb, i)
>>>
>>> }
>>>
>>> close(pb)
>>>
>>> colnames(DAT) = tickerNames
>>>
>>> return(DAT)
>>>
>>> }
>>>
>>>
>>>
>>> parseTickData <- function(inputFile) {
>>>
>>> DAT.list <- scan(file=inputFile,
>>> sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)
>>>
>>> index <-
>>> as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
>>> %H:%M:%S")
>>>
>>> DAT.xts <- xts(DAT.list$Close,index)
>>>
>>> DAT.xts <- make.index.unique(DAT.xts)
>>>
>>> return(DAT.xts)
>>>
>>> }
>>>
>>>
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list