[R] convert for loop into apply()
Hans W. Borchers
hwborchers at googlemail.com
Sun Aug 3 10:30:52 CEST 2008
Also, your request can easily be formulated as an SQL statement,
for example utilizing the 'sqldf' package:
----
library(sqldf)
a1 <- data.frame(id = 1:6,
cat = paste('cat', rep(1:3, c(2,3,1))),
st = c(1, 7, 30, 40, 59, 91),
en = c(5, 25, 39, 55, 70, 120))
a2 <- data.frame(id = paste('probe', 1:8),
cat = paste('cat', rep(1:3, c(2,3,3))),
st = c(1, 9, 20, 38, 53, 70, 80, 95),
en = c(6, 15, 36, 43, 58, 75, 85, 98))
sqldf("select a1.id as id, count(*) from a1, a2 where a1.cat = a2.cat
and a2.st <= a1.en
and a2.en >= a1.st
group by a1.id")
# id count(*)
# 1 1
# 2 1
# 3 2
# 4 2
# 6 1
----
Of course, it needs some overhead in generating the SQLite tables.
Therefore I would very much like to hear whether there is some
significant improvement -- or the contrary.
// Hans Werner Borchers
Anh Tran-2 wrote:
>
> Hi all,I know this topic has came up multiple times, but I've never fully
> understand the apply() function.
>
> Anyway, I'm here asking for your help again to convert this loop to
> apply().
>
> I have 2 data frames with the following information: a1 is the fragment
> that
> is need to be covered, a2 is the probes that cover the specific fragment.
>
> I need to count the number of probes cover every given fragment (they need
> to have the same cat ID to be on the same fragment)
>
> a1<-data.frame(id=c(1:6), cat=c('cat 1','cat 1','cat 2','cat 2','cat
> 2','cat
> 3'), st=c(1,7,30,40,59,91), en=c(5,25,39,55,70,120));
> a2<-data.frame(id=paste('probe',c(1:8)), cat=c('cat 1','cat 1','cat
> 2','cat
> 2','cat 2','cat 3','cat 3','cat 3'), st=c(1,9,20,38,53,70,80,95),
> en=c(6,15,36,43,58,75,85,98));
> a1$coverage<-NULL;
>
> I came up with this for loop (basically, if a probe starts before the
> fragment end, and end after a fragment start, it cover that fragment)
>
> for (i in 1:length(a1$id))
> {
> a1$coverage[i]<-length(a2[a2$st<=a1$en[i]&a2$en>=a1$st[i]&a2$cat==a1$cat[i],]$id);
> }
>
>> a1$coverage
> [1] 1 1 2 2 0 1
>
>
> This loop runs awefully slow when I have 200,000 probes and 30,000
> fragments. Is there anyway I can speed this up with apply()?
>
> This is the time for my for loop to scan through the first 20 record of my
> dataset:
> user system elapsed
> 2.264 0.501 2.770
>
> I think there is room for improvement here. Any idea?
>
> Thanks
> --
> Regards,
> Anh Tran
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/convert-for-loop-into-apply%28%29-tp18786483p18796799.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list