[R] Applying user function over a large matrix

Sudipta Sarkar ssarkar at lanworth.com
Tue Apr 29 22:58:28 CEST 2008


Jim
Thanks again
yes its 64 bit version of R.
So you suggest writing out 9 million columns and then loading
them individually and applying the function of each of these
columns!
But since we are using a for loop here too so won't it end up
taking almost same? Plus we aren't we increasing the i/o
overhead too?



---- Original message ----
>Date: Tue, 29 Apr 2008 16:40:03 -0400
>From: "jim holtman" <jholtman at gmail.com>  
>Subject: Re: [R] Applying user function over a large matrix  
>To: "Sudipta Sarkar" <ssarkar at lanworth.com>
>
>Are you running a 64-bit version of R on the Mac?
>
>Here is an example script for writing out the columns wit 'save'
>
>x <- matrix(runif(100000), ncol=10)
># write out each column to a file
># also use 'save' so the data is already in binary
>for (i in seq(ncol(x))){
>    column <- x[,i]
>    save(column, file=sprintf("/column_%02d_.Rdata", i))
>}
>
># you can then read them back in with 'load' and the data will be
># in the variable 'column'
>
>On Tue, Apr 29, 2008 at 4:27 PM, Sudipta Sarkar
<ssarkar at lanworth.com> wrote:
>> Hi Jim,
>> Thanks for your prompt response,
>>
>> I am using a fairly powerful Mac with Leopard OS and 17GB RAM
>> and 2x3 GhZ intel zeon processor so I do not think the system
>> is paging. I also using the Rmpi and snow utilities to
>> parallelize it but even then it takes 3.5-4 hours to just
>> complete one chunk of matrices.
>> You mentioned about storing the data and applying on 1 column
>> at a time. Any hint on how I should I go about doing that? I
>> cam across the filehash package but am not sure how to use
>> apply over an environment variable. So any help in this
>> direction will be most welcome.
>> thanks
>>
>>
>>
>> ---- Original message ----
>> >Date: Tue, 29 Apr 2008 16:05:41 -0400
>> >From: "jim holtman" <jholtman at gmail.com>
>> >Subject: Re: [R] Applying user function over a large matrix
>> >To: "Sudipta Sarkar" <ssarkar at lanworth.com>
>> >
>> >What size machine do you have.  A single copy of your
object will
>> >require 1.5GB of memory.  How slow is slow?  Is the operating
>> system
>> >paging because it does not have enough physical memory?  can
>> you store
>> >the data and only operate on 1 column at a time -- this
>> reduces the
>> >size of the object to 72MB.
>> >
>> >On Tue, Apr 29, 2008 at 3:16 PM, Sudipta Sarkar
>> <ssarkar at lanworth.com> wrote:
>> >> Respected R experts,
>> >> I am trying to apply a user function that basically
calls and
>> >> applies the R loess function from stat package over each
time
>> >> series. I have a large matrix of size 21 X 9000000 and I
need
>> >> to apply the loess for each column and hence I have
>> >> implemented this separate user function that applies loess
>> >> over each column and I am calling this function foo as
follows:
>> >> xc<-apply(t,2,foo) where t is my 21 X 9000000 matrix and
>> >> loess. This is turning out to be a very slow process and I
>> >> need to repeat this step for 25-30 such large matrix chunks.
>> >> Is there any trick I can use to make this work faster?
>> >> Any help will be deeply appreciated.
>> >> Regards
>> >>
>> >>
>> >> Sudipta Sarkar PhD
>> >> Senior Analyst/Scientist
>> >> Lanworth Inc. (Formerly Forest One Inc.)
>> >> 300 Park Blvd., Ste 425
>> >> Itasca, IL
>> >> Ph: 630-250-0468
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained,
>> reproducible code.
>> >>
>> >
>> >
>> >
>> >--
>> >Jim Holtman
>> >Cincinnati, OH
>> >+1 513 646 9390
>> >
>> >What is the problem you are trying to solve?
>>
>>
>> Sudipta Sarkar PhD
>> Senior Analyst/Scientist
>> Lanworth Inc. (Formerly Forest One Inc.)
>> 300 Park Blvd., Ste 425
>> Itasca, IL
>> Ph: 630-250-0468
>>
>
>
>
>-- 
>Jim Holtman
>Cincinnati, OH
>+1 513 646 9390
>
>What is the problem you are trying to solve?


Sudipta Sarkar PhD
Senior Analyst/Scientist
Lanworth Inc. (Formerly Forest One Inc.)
300 Park Blvd., Ste 425
Itasca, IL
Ph: 630-250-0468



More information about the R-help mailing list