[R] alternative to matching/merge?

Lana Schaffer schaffer at scripps.edu
Fri Jun 13 18:52:50 CEST 2008


Jim,
d.frame[[i]] is a list of data.frames and seqFile is a
data.frame.  I have coverted them to vectors/matrixes and
the timing is the same as data.frame.  'index' is unique
in both structures.  The list is subset into data.frame/matrix
structures.  
Lana

-----Original Message-----
From: jim holtman [mailto:jholtman at gmail.com] 
Sent: Friday, June 13, 2008 9:45 AM
To: Lana Schaffer
Cc: r-help at r-project.org
Subject: Re: [R] alternative to matching/merge?

What is the structure of 'd.frame' and 'segFile'?  Run Rprof so that we
can see which of the functions it is spending its time in.  What happens
if x$index is not in seqFile$index?  Are the values in the 'index'
unique in both structures?  Subsetting a data frame can be expensive
when compared to using a matrix.  Could you use a matrix instead of a
data frame; are all the columns the same mode?  Again either a subset of
data would be helpful or an 'str' on the data objects being used so that
we can understand what they are.

On Fri, Jun 13, 2008 at 12:03 PM, Lana Schaffer <schaffer at scripps.edu>
wrote:
> Jim,
> My code is this:
>  mergefunc <- function(x,seqFile){
> #     merge(seqFile,x)
> cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)),
> ])
> }
> LIX <- lapply(d.frame[[1]], mergefunc,seqFile=seqFile) Each 
> matrix/data.frame takes 0.2 seconds and then to do this 1240 times 
> takes ~4 minutes.
> Thanks,
> Lana
>
> -----Original Message-----
> From: jim holtman [mailto:jholtman at gmail.com]
> Sent: Thursday, June 12, 2008 6:40 PM
> To: Lana Schaffer
> Cc: r-help at r-project.org
> Subject: Re: [R] alternative to matching/merge?
>
> It would be nice if you at least included the code that you are using 
> and a subset of the data.  Have you run Rprof to determine which of 
> the functions is consuming the time?
>
> On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer <schaffer at scripps.edu>
> wrote:
>>
>> Greetings,
>> I am doing matching/merge for a table (40919x3) to data which is in 
>> the form of a list of 1268 data.frames.  Using lapply this is taking
>> ~5 minutes.  I know that the match/merge functions are time 
>> consuming,
>
>> so is there an alternative to this accomplish this goal?  is lapply 
>> not efficient?
>>
>> Lana Schaffer
>> Biostatistics/Informatics
>> The Scripps Research Institute
>> DNA Array Core Facility
>> La Jolla, CA 92037
>> (858) 784-2263
>> (858) 784-2994
>> schaffer at scripps.edu
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list