[R] how to work with long vectors
Changbin Du
changbind at gmail.com
Thu Nov 4 17:45:41 CET 2010
Thanks, Jim!
This is not what I want, What I want is calculate the percentage of reads
bigger or equal to that reads in each position.MY output is like the
following:
for row 1, all the reads is >= 4, so the cover_per is 100,
for row 2, 99 % reads >=4, so the cover_per is 99.
> head(final)
cover_per reads
1 100 4
2 99 8
3 98 13
4 97 14
5 96 17
6 95 20
I attached the input file with this email. This file is only 100 rows, very
small. MY original data set is 3384766 rows.
>
matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
sep="\t", skip=0, header=F,fill=T) #
> dim(matt)
[1] 3384766 2
Thanks so much for your time!
> matt<-read.table("/home/cdu/operon/dimer5_0623/matt_test.txt", sep="\t",
skip=0, header=F,fill=T) #
> names(matt)<-c("id","reads")
> dim(matt)
[1] 100 2
> cover<-matt$reads
>
>
> #calculate the cumulative coverage.
> cover_per<-function (data) {
+ output<-numeric(0)
+ for (i in data) {
+ x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
+ output<-c(output, x)
+ }
+ return(output)
+ }
>
>
> result<-cover_per(cover)
> head(result)
[1] 100 99 98 97 96 95
>
> final<-data.frame(result, cover)
>
> names(final)<-c("cover_per", "reads")
> head(final)
cover_per reads
1 100 4
2 99 8
3 98 13
4 97 14
5 96 17
6 95 20
On Thu, Nov 4, 2010 at 9:18 AM, jim holtman <jholtman at gmail.com> wrote:
> Is this what you want:
>
> > x
> id reads
> 1 Contig79:1 4
> 2 Contig79:2 8
> 3 Contig79:3 13
> 4 Contig79:4 14
> 5 Contig79:5 17
> 6 Contig79:6 20
> 7 Contig79:7 25
> 8 Contig79:8 27
> 9 Contig79:9 32
> 10 Contig79:10 33
> 11 Contig79:11 34
> > x$percent <- x$reads / max(x$reads) * 100
> > x
> id reads percent
> 1 Contig79:1 4 11.76471
> 2 Contig79:2 8 23.52941
> 3 Contig79:3 13 38.23529
> 4 Contig79:4 14 41.17647
> 5 Contig79:5 17 50.00000
> 6 Contig79:6 20 58.82353
> 7 Contig79:7 25 73.52941
> 8 Contig79:8 27 79.41176
> 9 Contig79:9 32 94.11765
> 10 Contig79:10 33 97.05882
> 11 Contig79:11 34 100.00000
>
>
> On Thu, Nov 4, 2010 at 11:46 AM, Changbin Du <changbind at gmail.com> wrote:
> > HI, Dear R community,
> >
> > I have one data set like this, What I want to do is to calculate the
> > cumulative coverage. The following codes works for small data set (#rows
> =
> > 100), but when feed the whole data set, it still running after 24 hours.
> > Can someone give some suggestions for long vector?
> >
> > id reads
> > Contig79:1 4
> > Contig79:2 8
> > Contig79:3 13
> > Contig79:4 14
> > Contig79:5 17
> > Contig79:6 20
> > Contig79:7 25
> > Contig79:8 27
> > Contig79:9 32
> > Contig79:10 33
> > Contig79:11 34
> >
> >
> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
> > sep="\t", skip=0, header=F,fill=T) #
> > dim(matt)
> > [1] 3384766 2
> >
> > matt_plot<-function(matt, outputfile) {
> > names(matt)<-c("id","reads")
> >
> > cover<-matt$reads
> >
> >
> > #calculate the cumulative coverage.
> > + cover_per<-function (data) {
> > + output<-numeric(0)
> > + for (i in data) {
> > + x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
> > + output<-c(output, x)
> > + }
> > + return(output)
> > + }
> >
> >
> > result<-cover_per(cover)
> >
> >
> > Thanks so much!
> >
> >
> > --
> > Sincerely,
> > Changbin
> > --
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
--
Sincerely,
Changbin
--
Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856
-------------- next part --------------
Contig79:1 4
Contig79:2 8
Contig79:3 13
Contig79:4 14
Contig79:5 17
Contig79:6 20
Contig79:7 25
Contig79:8 27
Contig79:9 32
Contig79:10 33
Contig79:11 34
Contig79:12 36
Contig79:13 39
Contig79:14 40
Contig79:15 44
Contig79:16 49
Contig79:17 55
Contig79:18 56
Contig79:19 59
Contig79:20 60
Contig79:21 62
Contig79:22 64
Contig79:23 64
Contig79:24 68
Contig79:25 68
Contig79:26 68
Contig79:27 70
Contig79:28 73
Contig79:29 76
Contig79:30 77
Contig79:31 78
Contig79:32 78
Contig79:33 79
Contig79:34 80
Contig79:35 80
Contig79:36 84
Contig79:37 87
Contig79:38 87
Contig79:39 88
Contig79:40 88
Contig79:41 89
Contig79:42 93
Contig79:43 94
Contig79:44 98
Contig79:45 99
Contig79:46 99
Contig79:47 102
Contig79:48 103
Contig79:49 108
Contig79:50 112
Contig79:51 112
Contig79:52 113
Contig79:53 116
Contig79:54 118
Contig79:55 120
Contig79:56 124
Contig79:57 124
Contig79:58 126
Contig79:59 128
Contig79:60 130
Contig79:61 133
Contig79:62 134
Contig79:63 136
Contig79:64 139
Contig79:65 144
Contig79:66 145
Contig79:67 146
Contig79:68 148
Contig79:69 149
Contig79:70 151
Contig79:71 156
Contig79:72 157
Contig79:73 158
Contig79:74 159
Contig79:75 159
Contig79:76 159
Contig79:77 160
Contig79:78 160
Contig79:79 161
Contig79:80 163
Contig79:81 164
Contig79:82 165
Contig79:83 165
Contig79:84 165
Contig79:85 165
Contig79:86 166
Contig79:87 170
Contig79:88 170
Contig79:89 172
Contig79:90 174
Contig79:91 178
Contig79:92 180
Contig79:93 181
Contig79:94 184
Contig79:95 184
Contig79:96 187
Contig79:97 190
Contig79:98 192
Contig79:99 194
Contig79:100 199
More information about the R-help
mailing list