[R] Memory issues on a 64-bit debian system (quantreg)

Wed Jul 1 22:03:34 CEST 2009

Just wanted to leave a note on this, after I got my new iMac (and 
installed R64 from the AT&T site) -- quantreg did run, after topping out 
at whopping 12GB of swap space (MacOS X, at least, should theoretically 
have as much swap space as there is space on the HD -- it will 
dynamically increase it as memory usage goes up).  I did get a "caught 
segfault" error but it wasn't until I did a ?rqss and clicked on a PDF 
vignette in the help browser (I was able to summary(tahoe_rq) with no 
problem).  I don't know if the mac help browser has some issue under 64 
bit systems, may be worth looking into.

I figure its best to first work out the parameters (tau) with a random 
subset first, at least for efficiency sake, then deploy the algorithm on 
the entire dataset. 

--j

roger koenker wrote:
> my earlier comment is probably irrelevant since you are fitting only 
> one qss component and have no other covariates.
> A word of warning though when you go back to this on your new  machine 
> -- you are almost surely going to want to specify
> a large lambda for the qss component  in the rqss call.  The default 
> of 1 is likely to produce something very very rough with
> such a large dataset.
>
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    rkoenker at uiuc.edu            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Urbana, IL 61801
>
>
>
> On Jun 24, 2009, at 5:04 PM, Jonathan Greenberg wrote:
>
>> Yep, its looking like a memory issue -- we have 6GB RAM and 1GB swap 
>> -- I did notice that the analysis takes far less memory (and runs) if I:
>>
>> tahoe_rq <- 
>> rqss(ltbmu_4_stemsha_30m_exp.img~ltbmu_eto_annual_mm.img,tau=.99,data=boundary_data) 
>>
>>   (which I assume fits a line to the quantiles)
>> vs.
>> tahoe_rq <- 
>> rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.99,data=boundary_data) 
>>
>>   (which is fitting a spline)
>>
>> Unless anyone else has any hints as to whether or not I'm making a 
>> mistake in my call (beyond randomly subsetting the data -- I'd like 
>> to run the analysis on the full dataset to begin with) -- I'd like to 
>> fit a spline to the upper 1% of the data, I'll just wait until my new 
>> computer comes in next week which has more RAM.  Thanks!
>>
>> --j
>>
>>
>> roger koenker wrote:
>>> Jonathan,
>>>
>>> Take a look at the output of sessionInfo(), it should say x86-64 if 
>>> you have a 64bit installation, or at least I think this is the case.
>>>
>>> Regarding rqss(),  my experience is that (usually) memory problems 
>>> are due to the fact that early on the processing there is
>>> a call to model.matrix()  which is supposed to create a design, aka 
>>> X, matrix  for the problem.  This matrix is then coerced to
>>> matrix.csr sparse format, but the dense form is often too big for 
>>> the machine to cope with.  Ideally, someone would write an
>>> R version of model.matrix that would permit building the matrix in 
>>> sparse form from the get-go, but this is a non-trivial task.
>>> (Or at least so it appeared to me when I looked into it a few years 
>>> ago.)  An option is to roll your own X matrix:  take a smalller
>>> version of the data, apply the formula, look at the structure of X 
>>> and then try to make a sparse version of the full X matrix.
>>> This is usually not that difficult, but "usually" is based on a 
>>> rather small sample that may not be representative of your problems.
>>>
>>> Hope that this helps,
>>>
>>> Roger
>>>
>>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
>>> email    rkoenker at uiuc.edu            Department of Economics
>>> vox:     217-333-4558                University of Illinois
>>> fax:       217-244-6678                Urbana, IL 61801
>>>
>>>
>>>
>>> On Jun 24, 2009, at 4:07 PM, Jonathan Greenberg wrote:
>>>
>>>> Rers:
>>>>
>>>>  I installed R 2.9.0 from the Debian package manager on our amd64 
>>>> system that currently has 6GB of RAM -- my first question is 
>>>> whether this installation is a true 64-bit installation (should R 
>>>> have access to > 4GB of RAM?)  I suspect so, because I was running 
>>>> an rqss() (package quantreg, installed via install.packages() -- I 
>>>> noticed it required a compilation of the source) and watched the 
>>>> memory usage spike to 4.9GB (my input data contains > 500,000 
>>>> samples).
>>>>
>>>>  With this said, after 30 mins or so of processing, I got the 
>>>> following error:
>>>>
>>>> tahoe_rq <- 
>>>> rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.99,data=boundary_data) 
>>>>
>>>> Error: cannot allocate vector of size 1.5 Gb
>>>>
>>>>  The dataset is a bit big (300mb or so), so I'm not providing it 
>>>> unless necessary to solve this memory problem.
>>>>
>>>>  Thoughts?  Do I need to compile either the main R "by hand" or the 
>>>> quantreg package?
>>>>
>>>> --j
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

-- 

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Cell: 415-794-5043
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307