# [R] how to work with long vectors

Martin Morgan mtmorgan at fhcrc.org
Fri Nov 5 17:42:47 CET 2010

```On 11/05/2010 09:13 AM, Changbin Du wrote:
> HI, Phil,
>
> I used the following codes and run it overnight for 15 hours, this morning,
> I stopped it. It seems it is still not efficient.
>
>
>>
>
>> dim(matt)
> [1] 3384766       2

[snip]

>>> On Thu, 4 Nov 2010, Changbin Du wrote:
>>>
>>>  HI, Dear R community,
>>>>
>>>> I have one data set like this,  What I want to do is to calculate the
>>>> cumulative coverage. The following codes works for small data set (#rows
>>>> =
>>>> 100), but when feed the whole data set,  it still running after 24 hours.
>>>> Can someone give some suggestions for long vector?
>>>>
>>>> Contig79:1    4
>>>> Contig79:2    8
>>>> Contig79:3    13
>>>> Contig79:4    14
>>>> Contig79:5    17
>>>> Contig79:6    20
>>>> Contig79:7    25
>>>> Contig79:8    27
>>>> Contig79:9    32
>>>> Contig79:10    33
>>>> Contig79:11    34
>>>>
>>>>
>>>> dim(matt)
>>>> [1] 3384766       2
>>>>
>>>> matt_plot<-function(matt, outputfile) {
>>>>
>>>>
>>>>
>>>> #calculate the cumulative coverage.
>>>> + cover_per<-function (data) {
>>>> + output<-numeric(0)
>>>> + for (i in data) {
>>>> +           x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
>>>> +           output<-c(output, x)
>>>> +                 }
>>>> + return(output)
>>>> + }
>>>>
>>>>
>>>> result<-cover_per(cover)

Hi Changbin

If I understand correctly, your contigs 'start' at position 1, and have
'width' equal to matt\$reads. You'd like to know the coverage at the last
covered location of each contig in matt\$reads.

## first time only
source("http://bioconductor.org")
biocLite("IRanges")

##
library(IRanges)
cvg = coverage(contigs) ## an RLE summarizing coverage, from position 1
as.vector(cvg[matt\$reads]) / nrow(matt)  ## at the end of each contig

for a larger data set:

> system.time(cvg <- coverage(contigs))
user  system elapsed
5.145   0.050   5.202

Martin

>>>>
>>>>
>>>> Thanks so much!
>>>>
>>>>
>>>> --
>>>> Sincerely,
>>>> Changbin
>>>> --
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>
>>
>> --
>> Sincerely,
>> Changbin
>> --
>>
>> Changbin Du
>> DOE Joint Genome Institute
>> Bldg 400 Rm 457
>> 2800 Mitchell Dr
>> Walnut Creet, CA 94598
>> Phone: 925-927-2856
>>
>>
>>
>
>

--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

```