# [R] how to work with long vectors

Phil Spector spector at stat.berkeley.edu
Thu Nov 4 18:16:54 CET 2010

```Changbin -
Does

give what you want?

By the way, if you want to use a loop (there's nothing wrong with that),
then try to avoid the most common mistake that people make with loops in R:
having your result grow inside the loop.  Here's a better way to use a loop

cover_per_1 <- function(data){
l = length(data)
output = numeric(l)
for(i in 1:l)output[i] = 100 * sum(ifelse(data >= data[i], 1, 0))/length(data)
output
}

Using some random data, and comparing to your original cover_per function:

> dat = rnorm(1000)
> system.time(one <- cover_per(dat))
user  system elapsed
0.816   0.000   0.824
> system.time(two <- cover_per_1(dat))
user  system elapsed
0.792   0.000   0.805

Not that big a speedup, but it does increase quite a bit as the problem gets
larger.

There are two obvious ways to speed up your function:
1)  Eliminate the ifelse function, since automatic coersion from
logical to numeric does the same thing.
2)  Multiply by 100 and divide by the length outside the loop:

cover_per_2 <- function(data){
l = length(data)
output = numeric(l)
for(i in 1:l)output[i] = sum(data >= data[i])
100 * output / l
}

> system.time(three <- cover_per_2(dat))
user  system elapsed
0.024   0.000   0.027

That makes the loop just about equivalent to the sapply solution:

> system.time(four <- 100*sapply(dat,function(x)sum(dat >= x))/length(dat))
user  system elapsed
0.024   0.000   0.026

- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu

On Thu, 4 Nov 2010, Changbin Du wrote:

> HI, Dear R community,
>
> I have one data set like this,  What I want to do is to calculate the
> cumulative coverage. The following codes works for small data set (#rows =
> 100), but when feed the whole data set,  it still running after 24 hours.
> Can someone give some suggestions for long vector?
>
> Contig79:1    4
> Contig79:2    8
> Contig79:3    13
> Contig79:4    14
> Contig79:5    17
> Contig79:6    20
> Contig79:7    25
> Contig79:8    27
> Contig79:9    32
> Contig79:10    33
> Contig79:11    34
>
> dim(matt)
> [1] 3384766       2
>
> matt_plot<-function(matt, outputfile) {
>
>
>
> #calculate the cumulative coverage.
> + cover_per<-function (data) {
> + output<-numeric(0)
> + for (i in data) {
> +           x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
> +           output<-c(output, x)
> +                 }
> + return(output)
> + }
>
>
> result<-cover_per(cover)
>
>
> Thanks so much!
>
>
> --
> Sincerely,
> Changbin
> --
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help