[R] Faster Solution for a simple code?
Petr PIKAL
petr.pikal at precheza.cz
Tue Apr 14 13:33:40 CEST 2009
Hi
this one could be slightly quicker but i am not completely sure because it
gives me different results from yours
set.seed(111)
x<-x1<-x2<- data.frame(a=sample(1:50, 10000, replace=T), b=sample(100:500,
10000, replace=T))
y<- data.frame(a=sample(1:50, 10000, replace=T), b=sample(100:500, 10000,
replace=T))
system.time({
one<-paste(x[,1], x[,2], sep=".")
two<-paste(y[,1], y[,2], sep=".")
tab<-table(two[two %in% one])
x1[match(names(tab), one),3]<-tab
})
system.time({
z <- merge(x, y, by=c("a", "b"), all.x=TRUE)
x2<-t(sapply(split(z, z[,1:2], drop=TRUE), function(.grp){
if (any(is.na(.grp))) return(c(.grp[1,1], .grp[1,2], 0))
c(.grp[1,1], .grp[1,2], nrow(.grp))
}))
})
head(x2)
head(x1[order(x1[,1], x1[,2]),])
Regards
Petr
r-help-bounces at r-project.org napsal dne 13.04.2009 21:51:18:
> try this:
>
> > x
> V1 V2 V3
> 1 5000000 3200000 0
> 2 5100000 3100000 0
> 3 5200000 3100000 0
> 4 5200000 3200000 0
> > y
> V1 V2 V3
> 1 5000000 3200000 1
> 2 5000000 3200000 1
> 3 5200000 3100000 1
> 4 5200000 3000000 1
> > z <- merge(x, y, by=c("V1", "V2"), all.x=TRUE)
> > t(sapply(split(z, z[,1:2], drop=TRUE), function(.grp){
> + if (any(is.na(.grp))) return(c(.grp[1,1], .grp[1,2], 0))
> + c(.grp[1,1], .grp[1,2], nrow(.grp))
> + }))
> [,1] [,2] [,3]
> 5100000.3100000 5100000 3100000 0
> 5200000.3100000 5200000 3100000 1
> 5000000.3200000 5000000 3200000 2
> 5200000.3200000 5200000 3200000 0
> >
>
>
> On Mon, Apr 13, 2009 at 1:06 PM, Chris82 <rubenbauar at gmx.de> wrote:
> >
> > Hi R-users,
> >
> > I create a simple code to check out how often the same numbers in y
occur in
> > x. For example 5000000 320000 occurs two times.
> > But the code with the loop is extremly slow. x have 6100 lines and y
> > sometimes more than 50000 lines.
> >
> > Is there any alternative code to create with R?
> >
> > thanks.
> >
> >
> > x
> >
> > 5000000 3200000 0
> > 5100000 3100000 0
> > 5200000 3100000 0
> > 5200000 3200000 0
> >
> >
> > lengthx <- length(x[,1])
> >
> > y
> >
> > 5000000 3200000 1
> > 5000000 3200000 1
> > 5200000 3100000 1
> > 5200000 3000000 1
> >
> >
> > langthy <- length(y[,1])
> >
> > for (i in 1:lengthx){
> > for (j in 1:lengthy){
> > if (x[i,1] == y[j,1]){
> > if (x[i,2] == y[j,2]){
> > x[i,3] <- x[i,3] + 1
> > }
> > }
> > }
> > }
> > x
> >
> > 1 5000000 3200000 2
> > 2 5100000 3100000 0
> > 3 5200000 3100000 1
> > 4 5200000 3200000 0
> > --
> > View this message in context:
http://www.nabble.com/Faster-Solution-for-a-
> simple-code--tp23024985p23024985.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list