[R] how to work with long vectors
Changbin Du
changbind at gmail.com
Thu Nov 4 18:04:45 CET 2010
HI, Henrique,
Thanks for the great help!
I compared the output from your codes:
> te<-rev(100 * cumsum(matt$reads > 1) / length(matt$reads) )
> te
[1] 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84
83
[19] 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66
65
[37] 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48
47
[55] 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30
29
[73] 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12
11
[91] 10 9 8 7 6 5 4 3 2 1
the output from my code,
> result
[1] 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84
83
[19] 82 81 80 79 79 77 77 77 74 73 72 71 70 70 68 67 67
65
[37] 64 64 62 62 60 59 58 57 56 56 54 53 52 51 51 49 48
47
[55] 46 45 45 43 42 41 40 39 38 37 36 35 34 33 32 31 30
29
[73] 28 27 27 27 24 24 22 21 20 19 19 19 19 15 14 14 12
11
[91] 10 9 8 7 7 5 4 3 2 1
There is no tie in your output. Look at the data set: There are ties in the
data set. Your codes work fast, but I think the results is not accurate.
Thanks so much for the great help!
> matt[c(1:35), ]
id reads
1 Contig79:1 4
2 Contig79:2 8
;
;
22 Contig79:22 64
23 Contig79:23 64
24 Contig79:24 68
25 Contig79:25 68
26 Contig79:26 68
I also attached the testing file with this email. Thanks!
On Thu, Nov 4, 2010 at 9:12 AM, Henrique Dallazuanna <wwwhsd at gmail.com>wrote:
> Try this:
>
> rev(100 * cumsum(matt$reads > 1) / length(matt$reads) )
>
> On Thu, Nov 4, 2010 at 1:46 PM, Changbin Du <changbind at gmail.com> wrote:
>
>> HI, Dear R community,
>>
>> I have one data set like this, What I want to do is to calculate the
>> cumulative coverage. The following codes works for small data set (#rows =
>> 100), but when feed the whole data set, it still running after 24 hours.
>> Can someone give some suggestions for long vector?
>>
>> id reads
>> Contig79:1 4
>> Contig79:2 8
>> Contig79:3 13
>> Contig79:4 14
>> Contig79:5 17
>> Contig79:6 20
>> Contig79:7 25
>> Contig79:8 27
>> Contig79:9 32
>> Contig79:10 33
>> Contig79:11 34
>>
>>
>> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
>> sep="\t", skip=0, header=F,fill=T) #
>> dim(matt)
>> [1] 3384766 2
>>
>> matt_plot<-function(matt, outputfile) {
>> names(matt)<-c("id","reads")
>>
>> cover<-matt$reads
>>
>>
>> #calculate the cumulative coverage.
>> + cover_per<-function (data) {
>> + output<-numeric(0)
>> + for (i in data) {
>> + x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
>> + output<-c(output, x)
>> + }
>> + return(output)
>> + }
>>
>>
>> result<-cover_per(cover)
>>
>>
>> Thanks so much!
>>
>>
>> --
>> Sincerely,
>> Changbin
>> --
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
--
Sincerely,
Changbin
--
Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856
-------------- next part --------------
Contig79:1 4
Contig79:2 8
Contig79:3 13
Contig79:4 14
Contig79:5 17
Contig79:6 20
Contig79:7 25
Contig79:8 27
Contig79:9 32
Contig79:10 33
Contig79:11 34
Contig79:12 36
Contig79:13 39
Contig79:14 40
Contig79:15 44
Contig79:16 49
Contig79:17 55
Contig79:18 56
Contig79:19 59
Contig79:20 60
Contig79:21 62
Contig79:22 64
Contig79:23 64
Contig79:24 68
Contig79:25 68
Contig79:26 68
Contig79:27 70
Contig79:28 73
Contig79:29 76
Contig79:30 77
Contig79:31 78
Contig79:32 78
Contig79:33 79
Contig79:34 80
Contig79:35 80
Contig79:36 84
Contig79:37 87
Contig79:38 87
Contig79:39 88
Contig79:40 88
Contig79:41 89
Contig79:42 93
Contig79:43 94
Contig79:44 98
Contig79:45 99
Contig79:46 99
Contig79:47 102
Contig79:48 103
Contig79:49 108
Contig79:50 112
Contig79:51 112
Contig79:52 113
Contig79:53 116
Contig79:54 118
Contig79:55 120
Contig79:56 124
Contig79:57 124
Contig79:58 126
Contig79:59 128
Contig79:60 130
Contig79:61 133
Contig79:62 134
Contig79:63 136
Contig79:64 139
Contig79:65 144
Contig79:66 145
Contig79:67 146
Contig79:68 148
Contig79:69 149
Contig79:70 151
Contig79:71 156
Contig79:72 157
Contig79:73 158
Contig79:74 159
Contig79:75 159
Contig79:76 159
Contig79:77 160
Contig79:78 160
Contig79:79 161
Contig79:80 163
Contig79:81 164
Contig79:82 165
Contig79:83 165
Contig79:84 165
Contig79:85 165
Contig79:86 166
Contig79:87 170
Contig79:88 170
Contig79:89 172
Contig79:90 174
Contig79:91 178
Contig79:92 180
Contig79:93 181
Contig79:94 184
Contig79:95 184
Contig79:96 187
Contig79:97 190
Contig79:98 192
Contig79:99 194
Contig79:100 199
More information about the R-help
mailing list