[R-SIG-Mac] single-threaded R, 100% CPU with BLAS, vecLib and ATLAS

Melanie Courtot mcourtot at gmail.com
Fri Oct 26 22:22:40 CEST 2012


Hi Ray,

Thanks a lot for your help. I tried to rename the file etc as you suggested below. You are right that the file has a header, its structure is as below:
@relation some string

@attribute ID numeric
@attribute Class {negative,positive}
@attribute Att1 numeric
...
[I have 1140 of those)

followed by a section @data and then the data rows.

It turns out that my colleague had copied the file on his machine and removed the headers manually, which was not done in our common SVN version which is the one I was trying to read. Removing the header solved the issue. 

RWeka looks very interesting, thanks for the pointer.

Cheers,
Melanie




On 2012-10-26, at 11:53 AM, Ray DiGiacomo, Jr. wrote:

> Hello Melanie,
> 
> Also make sure that you and your colleague have similar outputs for these two commands:
> 
> > search()
> > ls() # L as in Larry, S as in Sam
> 
> "search()" shows the packages you have loaded into R (which take up RAM).  "ls()" shows a list of your loaded R objects (which take up RAM).
> 
> - Ray
> 
> 
> 
> 
> 
> On Fri, Oct 26, 2012 at 11:17 AM, Ray DiGiacomo, Jr. <rayd at liondatasystems.com> wrote:
> Hello Melanie,
> 
> I'm not too familiar with ARFF but I believe it has some headers (and possibly footers) that may need to be removed before one can call the read.csv function.  I am assuming you and your colleague both manually removed the ARFF headers/footers before calling the read.csv function. 
> 
> You may also want to try changing the read.csv function call to:
> 
> frame1 <- read.csv("test.csv", header = FALSE)
> 
> You will have to manually change your filename to test.csv first.  Also notice that the "sep" argument is not needed as it defaults to a "comma".  I would also use the word "frame" instead of "mat" as the data will not be a matrix after you call the read.csv function - it will be a frame.  You can turn your frame into a matrix using other R commands if you like.  See this page:
> 
> http://stackoverflow.com/questions/5158790/data-frame-or-matrix
> 
> Also, there are R packages called "foreign" and "RWeka" which both have read.arff functions inside of them.  You may want to give these a try.  
> 
> You can learn about them here:
> 
> See Paper Page 3 (Digital Page 2)
> http://cran.r-project.org/web/packages/foreign/foreign.pdf
> 
> See Paper Page 6 (Digital Page 3)
> http://cran.r-project.org/web/packages/RWeka/RWeka.pdf
> 
> - Ray
> 
> 
> 
> 
> 
> 
> On Fri, Oct 26, 2012 at 10:18 AM, Melanie Courtot <mcourtot at gmail.com> wrote:
> Hi Ray and Simon, all,
> 
> Thanks for the help. My laptop has 8GB of RAM (my colleague has 12 on his desktop). I ssh'ed into his machine and the whole file loads in not even 2 seconds.
> The file is read with mat<-read.csv('test.arff',header=FALSE,sep=',') The arff file is what I use with Weka, which is basically a comma delimited file. It contains around 7.5M datapoints (6200 rows, 1140 columns)
> 
> It seems that with 8GB I should be quite ok?
> 
> Based on your suggestions I tried with a part of the file only, which does work fine, so it seems that it is indeed a memory problem. Any idea as to why?
> 
> Thanks,
> Melanie
> 
> 
> 
> Example record (I have 6200 of those)
> 856243,negative,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
> 
> 
> 
> On 2012-10-25, at 6:39 PM, Ray DiGiacomo, Jr. wrote:
> 
> > Hi Simon,
> >
> > I took the spec from this Revo SlideShare.  The spec is based on a regression.
> >
> > http://www.revolutionanalytics.com/news-events/free-webinars/2011/intro-to-r-for-sas-spss/
> >
> > Click the right arrow until you get to slide 3 of 14.  Then, look at the slide in the lower-right hand corner (slide 12).
> >
> > - Ray
> >
> >
> >
> >
> >
> > On Thu, Oct 25, 2012 at 6:26 PM, Simon Urbanek <simon.urbanek at r-project.org> wrote:
> >
> > On Oct 25, 2012, at 7:42 PM, Ray DiGiacomo, Jr. wrote:
> >
> > > Hello Melanie,
> > >
> > > How much RAM is installed on your MacBook Pro compared to your colleague's
> > > Linux machine?
> > >
> > > How big is your dataset in terms of rows and columns?
> > >
> > > I believe R can handle about 10M datapoints per GB of RAM.
> > >
> >
> > What exactly is that an estimate of? In R, 1GB of RAM will store ~134Mio datapoints when using numeric matrices/vectors and twice as many as integers or logicals. In practice, you will still need some room for computation on the data, though.
> >
> > Cheers,
> > Simon
> >
> >
> > > Note that datapoints = rows x columns
> > >
> > > Best Regards,
> > >
> > > Ray DiGiacomo, Jr.
> > > Master R Trainer
> > > President, Lion Data Systems LLC
> > > President, The Orange County R User Group
> > > Board Member, TDWI
> > > rayd at liondatasystems.com
> > > (Mobile) 408-425-7851
> > > San Juan Capistrano, California
> > >
> > > Check out my one-on-one web-based R courses at liondatasystems.com/courses
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Oct 25, 2012 at 4:16 PM, Melanie Courtot <mcourtot at gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> I am trying to run R on my MacBook Pro 2.4 GHz Intel core i5. I am trying
> > >> to read a csv file, which works fine on my work colleague's machine (under
> > >> linux) but causes my CPU to go up to 100% and makes the GUI unresponsive
> > >> and hangs on the command line. Activity monitor indicates there is only one
> > >> R thread running.
> > >>
> > >> I did see that by default R was using the BLAS library, which is
> > >> single-threaded, and that there was an option to use vecLib instead. I did
> > >> this, and
> > >> ls -l /Library/Frameworks/R.framework/Resources/lib/libRblas.dylib
> > >> does return
> > >> /Library/Frameworks/R.framework/Resources/lib/libRblas.dylib ->
> > >> libRblas.vecLib.dylib
> > >>
> > >> I however still see the same behavior: 100% CPU, single thread.
> > >>
> > >> I saw that some MacBook pro (Xeon Nehalem based) had a vecLib bug, so I
> > >> built the ATLAS library and symlinked R to libtatlas.dylib (unfortunately
> > >> the pre compiled binairies pointed to in a previous email on the list [1]
> > >> were not available anymore. Building ATLAS was... fun ;)) I was able to get
> > >> the shared libraries (using --shared in my config) but still see the same
> > >> behavior when trying to run my code. I was unsure if I should link to
> > >> libsatlas.dylib or libtatlas.dylib, so tried both (I guess the latter was
> > >> the right one though)
> > >>
> > >> I tried building R from the source (specifying -arch x86_64 and
> > >> --enable-BLAS-shlib to be able to switch libraries), but same behavior and
> > >> it seems it is an identical version to the prepackaged one (I tried with
> > >> BLAS, vecLib and ATLAS)
> > >>
> > >> R info: R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows", Platform:
> > >> x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> > >>
> > >> Any help would be greatly appreciated.
> > >>
> > >> Thanks,
> > >> Melanie
> > >>
> > >>
> > >> [1] https://stat.ethz.ch/pipermail/r-sig-mac/2010-October/007817.html
> > >>
> > >> ---
> > >> Mélanie Courtot
> > >> MSFHR/PCIRN Ph.D. Candidate,
> > >> BCCRC - Terry Fox Laboratory - 12th floor
> > >> 675 West 10th Avenue
> > >> Vancouver, BC
> > >> V5Z 1L3, Canada
> > >>
> > >> _______________________________________________
> > >> R-SIG-Mac mailing list
> > >> R-SIG-Mac at r-project.org
> > >> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
> > >>
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-SIG-Mac mailing list
> > > R-SIG-Mac at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-mac
> >
> >
> 
> 
> 



More information about the R-SIG-Mac mailing list