[R] Reducing execution time

sri vathsan srivibish at gmail.com
Wed Jul 27 19:29:38 CEST 2016


Hi,

Thanks for the solution. But I am afraid that after running this code still
it takes more time. It has been an hour and still it is executing. I
understand the delay because each triplet has to compare almost 9000
elements.

Regards,
Sri

On Wed, Jul 27, 2016 at 9:02 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:

> Hi,
>
> It's really a good idea to use dput() or some other reproducible way
> to provide data. I had to guess as to what your data looked like.
>
> It appears that order doesn't matter?
>
> Given than, here's one approach:
>
> combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L,
> 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1",
> "V2", "V3"), class = "data.frame", row.names = c(NA, -5L))
>
> dat <- list(
> c(77,65,34,23,55),
> c(65,23,77,65,55,34),
> c(77,34,65),
> c(55,78,56),
> c(98,23,77,65,34))
>
>
> sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat,
> function(j)all(combs[i,] %in% j))))
>
> On a dataset of comparable time to yours, it takes me under a minute and a
> half.
>
> > combs <- combs[rep(1:nrow(combs), length=100), ]
> > dat <- dat[rep(1:length(dat), length=10000)]
> >
> > dim(combs)
> [1] 100   3
> > length(dat)
> [1] 10000
> >
> > system.time(test <- sapply(seq_len(nrow(combs)),
> function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j)))))
>    user  system elapsed
>  86.380   0.006  86.391
>
>
>
>
> On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivibish at gmail.com> wrote:
> > Hi,
> >
> > Apologizes for the less information.
> >
> > Basically, myCombos is a matrix with 3 variables which is a triplet that
> is
> > a combination of 79 codes. There are around 3lakh combination as such and
> > it looks like below.
> >
> > V1 V2 V3
> > 65 23 77
> > 77 34 65
> > 55 34 23
> > 23 77 34
> > 34 65 55
> >
> > Each triplet will compare in a list (mylist) having 8177 elements which
> > will looks like below.
> >
> > 77,65,34,23,55
> > 65,23,77,65,55,34
> > 77,34,65
> > 55,78,56
> > 98,23,77,65,34
> >
> > Now I want to count the no of occurrence of the triplet in the above
> list.
> > I.e., the triplet 65 23 77 is seen 3 times in the list. So my output
> looks
> > like below
> >
> > V1 V2 V3 Freq
> > 65 23 77  3
> > 77 34 65  4
> > 55 34 23  2
> >
> > I hope, I made it clear this time.
> >
> >
> > On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4567 at gmail.com>
> wrote:
> >
> >> Not entirely sure I understand, but match() is already vectorized, so
> you
> >> should be able to lose the supply(). This would speed things up a lot.
> >> Please re-read ?match *carefully* .
> >>
> >> Bert
> >>
> >> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivibish at gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I created list of 3 combination numbers (mycombos, around 3 lakh
> >> combinations) and counting the occurrence of those combination in
> another
> >> list. This comparision list (mylist) is having around 8000 records.I am
> >> using the following code.
> >>
> >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) {
> >>   sum(sapply(myList, function(j) {
> >>     sum(!is.na(match(c(myCombos[i,]), j)))})==3)})
> >>
> >> The above code takes very long time to execute and is there any other
> >> effecting method which will reduce the time.
> >> --
> >>
> >> Regards,
> >> Srivathsan.K
> >>
>



-- 

Regards,
Srivathsan.K
Phone : 9600165206

	[[alternative HTML version deleted]]



More information about the R-help mailing list