[R] caretNWS and training data set sizes

Tait, Peter ptait at skura.com
Mon Mar 10 18:18:51 CET 2008


Hi Max,
Thank you for the fast response.

Here are the versions of the R packages I am using:

caret 3.13
caretNWS 0.16
nws 1.62

Here are the python versions

Active Python 2.5.1.1
nws server 1.5.2 for py2.5
twisted 2.5.9 py2.5

The computer I am using has 1 Xeon dual core cpu at 1.86 GHz with 4 GB of RAM. R is currently set up to use 2 GB of it (it starts with "C:\Program Files\R\R-2.6.2\bin\Rgui.exe" --max-mem-size=2047M). The OS is Windows Server 2003 R2 with SP2.

I am running one R job/process (Rgui.exe) and almost nothing else on the computer while R is running (no databases, web servers, office apps etc..)

I really appreciate your help.
Cheers
Peter


>-----Original Message-----
>From: Max Kuhn [mailto:mxkuhn at gmail.com]
>Sent: Monday, March 10, 2008 12:41 PM
>To: Tait, Peter
>Cc: r-help at R-project.org
>Subject: Re: [R] caretNWS and training data set sizes
>
>What version of caret and caretNWS are you using? Also, what version
>of the nws server and twisted are you using? What kind of machine (#
>processors, how much physical memory etc)?
>
>I haven't seen any real limitations with one exception: if you are
>running P jobs on the same machine, you are replicating the memory
>needs P times.
>
>I've been running jobs with 4K to 90K samples and 1200 predictors
>without issues, so I'll need a lot more information to help you.
>
>Max
>
>
>On Mon, Mar 10, 2008 at 12:04 PM, Tait, Peter <ptait at skura.com> wrote:
>> Hi,
>>
>>  I am using the caretNWS package to train some supervised regression
>models (gbm, lasso, random forest and mars). The problem I have encountered
>started when my training data set increased in the number of predictors and
>the number of observations.
>>
>>  The training data set has 347 numeric columns. The problem I have is
>when there are more then 2500 observations the 5 sleigh objects start but
>do not use any CPU resources and do not process any data.
>>
>>  N=100                     cpu(%)       memory(K)
>>  Rgui.exe                   0           91737
>>  5x sleighs (RTerm.exe)    15-25         ~27000
>>
>>  N=2500
>>  Rgui.exe                  0             160000
>>  5x sleighs (RTerm.exe)    15-25         ~74000
>>
>>  N=5000
>>  Rgui.exe                  50             193000
>>  5x sleighs (RTerm.exe)    0             ~19000
>>
>>
>>  A 10% sample of my overall data is ~22000 observations.
>>
>>  Can someone give me an idea of the limitations of the nws and caretNWS
>packages in terms of the number of columns and rows of the training
>matrices and if there are other tuning/training functions that work faster
>on large datasets?
>>
>>  Thanks for your help.
>>  Peter
>>
>>
>>  > version
>>                _
>>  platform       i386-pc-mingw32
>>  arch           i386
>>  os             mingw32
>>  system         i386, mingw32
>>  status
>>  major          2
>>  minor          6.2
>>  year           2008
>>  month          02
>>  day            08
>>  svn rev        44383
>>  language       R
>>  version.string R version 2.6.2 (2008-02-08)
>>
>>  > memory.limit()
>>  [1] 2047
>>
>>  ______________________________________________
>>  R-help at r-project.org mailing list
>>  https://stat.ethz.ch/mailman/listinfo/r-help
>>  PLEASE do read the posting guide http://www.R-project.org/posting-
>guide.html
>>  and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>--
>
>Max



More information about the R-help mailing list