# [R] how to work with long vectors

Martin Morgan mtmorgan at fhcrc.org
Thu Nov 4 18:06:56 CET 2010

```On 11/04/2010 09:45 AM, Changbin Du wrote:
> Thanks, Jim!
>
> This is not what I want,  What I want is calculate the percentage of reads
> bigger or equal to that reads in each position.MY output is like the
> following:

Hi Changbin -- I might be repeating myself, but the Bioconductor
packages IRanges and GenomicRanges are designed to work with this sort
of data, and include 'coverage' functions that do what you're interested
in. Look into ?GRanges if interested.

http://bioconductor.org/help/bioc-views/release/BiocViews.html#___HighThroughputSequencing

Martin

> for row 1, all the reads is >= 4, so the cover_per is 100,
> for row 2, 99 % reads >=4, so the cover_per is 99.
> 1       100     4
> 2        99     8
> 3        98    13
> 4        97    14
> 5        96    17
> 6        95    20
>
> I attached the input file with this email. This file is only 100 rows, very
> small. MY original data set is 3384766 rows.
>
>  >
>> dim(matt)
> [1] 3384766       2
>
> Thanks so much for your time!
>
>> dim(matt)
> [1] 100   2
>>
>>
>> #calculate the cumulative coverage.
>> cover_per<-function (data) {
> + output<-numeric(0)
> + for (i in data) {
> +           x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
> +           output<-c(output, x)
> +                 }
> + return(output)
> + }
>>
>>
>> result<-cover_per(cover)
> [1] 100  99  98  97  96  95
>>
>> final<-data.frame(result, cover)
>>
> 1       100     4
> 2        99     8
> 3        98    13
> 4        97    14
> 5        96    17
> 6        95    20
>
>
>
>
>
> On Thu, Nov 4, 2010 at 9:18 AM, jim holtman <jholtman at gmail.com> wrote:
>
>> Is this what you want:
>>
>>> x
>> 1   Contig79:1     4
>> 2   Contig79:2     8
>> 3   Contig79:3    13
>> 4   Contig79:4    14
>> 5   Contig79:5    17
>> 6   Contig79:6    20
>> 7   Contig79:7    25
>> 8   Contig79:8    27
>> 9   Contig79:9    32
>> 10 Contig79:10    33
>> 11 Contig79:11    34
>>> x
>> 1   Contig79:1     4  11.76471
>> 2   Contig79:2     8  23.52941
>> 3   Contig79:3    13  38.23529
>> 4   Contig79:4    14  41.17647
>> 5   Contig79:5    17  50.00000
>> 6   Contig79:6    20  58.82353
>> 7   Contig79:7    25  73.52941
>> 8   Contig79:8    27  79.41176
>> 9   Contig79:9    32  94.11765
>> 10 Contig79:10    33  97.05882
>> 11 Contig79:11    34 100.00000
>>
>>
>> On Thu, Nov 4, 2010 at 11:46 AM, Changbin Du <changbind at gmail.com> wrote:
>>> HI, Dear R community,
>>>
>>> I have one data set like this,  What I want to do is to calculate the
>>> cumulative coverage. The following codes works for small data set (#rows
>> =
>>> 100), but when feed the whole data set,  it still running after 24 hours.
>>> Can someone give some suggestions for long vector?
>>>
>>> Contig79:1    4
>>> Contig79:2    8
>>> Contig79:3    13
>>> Contig79:4    14
>>> Contig79:5    17
>>> Contig79:6    20
>>> Contig79:7    25
>>> Contig79:8    27
>>> Contig79:9    32
>>> Contig79:10    33
>>> Contig79:11    34
>>>
>>>
>>> dim(matt)
>>> [1] 3384766       2
>>>
>>> matt_plot<-function(matt, outputfile) {
>>>
>>>
>>>
>>> #calculate the cumulative coverage.
>>> + cover_per<-function (data) {
>>> + output<-numeric(0)
>>> + for (i in data) {
>>> +           x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
>>> +           output<-c(output, x)
>>> +                 }
>>> + return(output)
>>> + }
>>>
>>>
>>>  result<-cover_per(cover)
>>>
>>>
>>> Thanks so much!
>>>
>>>
>>> --
>>> Sincerely,
>>> Changbin
>>> --
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.

--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

```