[R] how to work with long vectors

Changbin Du changbind at gmail.com
Thu Nov 4 17:45:41 CET 2010


Thanks, Jim!

This is not what I want,  What I want is calculate the percentage of reads
bigger or equal to that reads in each position.MY output is like the
following:
for row 1, all the reads is >= 4, so the cover_per is 100,
for row 2, 99 % reads >=4, so the cover_per is 99.
> head(final)
  cover_per reads
1       100     4
2        99     8
3        98    13
4        97    14
5        96    17
6        95    20

I attached the input file with this email. This file is only 100 rows, very
small. MY original data set is 3384766 rows.

 >
matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
sep="\t", skip=0, header=F,fill=T) #
> dim(matt)
[1] 3384766       2

Thanks so much for your time!

> matt<-read.table("/home/cdu/operon/dimer5_0623/matt_test.txt", sep="\t",
skip=0, header=F,fill=T) #
> names(matt)<-c("id","reads")
> dim(matt)
[1] 100   2
> cover<-matt$reads
>
>
> #calculate the cumulative coverage.
> cover_per<-function (data) {
+ output<-numeric(0)
+ for (i in data) {
+           x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
+           output<-c(output, x)
+                 }
+ return(output)
+ }
>
>
> result<-cover_per(cover)
> head(result)
[1] 100  99  98  97  96  95
>
> final<-data.frame(result, cover)
>
> names(final)<-c("cover_per", "reads")
> head(final)
  cover_per reads
1       100     4
2        99     8
3        98    13
4        97    14
5        96    17
6        95    20





On Thu, Nov 4, 2010 at 9:18 AM, jim holtman <jholtman at gmail.com> wrote:

> Is this what you want:
>
> > x
>            id reads
> 1   Contig79:1     4
> 2   Contig79:2     8
> 3   Contig79:3    13
> 4   Contig79:4    14
> 5   Contig79:5    17
> 6   Contig79:6    20
> 7   Contig79:7    25
> 8   Contig79:8    27
> 9   Contig79:9    32
> 10 Contig79:10    33
> 11 Contig79:11    34
> > x$percent <- x$reads / max(x$reads) * 100
> > x
>            id reads   percent
> 1   Contig79:1     4  11.76471
> 2   Contig79:2     8  23.52941
> 3   Contig79:3    13  38.23529
> 4   Contig79:4    14  41.17647
> 5   Contig79:5    17  50.00000
> 6   Contig79:6    20  58.82353
> 7   Contig79:7    25  73.52941
> 8   Contig79:8    27  79.41176
> 9   Contig79:9    32  94.11765
> 10 Contig79:10    33  97.05882
> 11 Contig79:11    34 100.00000
>
>
> On Thu, Nov 4, 2010 at 11:46 AM, Changbin Du <changbind at gmail.com> wrote:
> > HI, Dear R community,
> >
> > I have one data set like this,  What I want to do is to calculate the
> > cumulative coverage. The following codes works for small data set (#rows
> =
> > 100), but when feed the whole data set,  it still running after 24 hours.
> > Can someone give some suggestions for long vector?
> >
> > id                reads
> > Contig79:1    4
> > Contig79:2    8
> > Contig79:3    13
> > Contig79:4    14
> > Contig79:5    17
> > Contig79:6    20
> > Contig79:7    25
> > Contig79:8    27
> > Contig79:9    32
> > Contig79:10    33
> > Contig79:11    34
> >
> >
> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
> > sep="\t", skip=0, header=F,fill=T) #
> > dim(matt)
> > [1] 3384766       2
> >
> > matt_plot<-function(matt, outputfile) {
> > names(matt)<-c("id","reads")
> >
> >  cover<-matt$reads
> >
> >
> > #calculate the cumulative coverage.
> > + cover_per<-function (data) {
> > + output<-numeric(0)
> > + for (i in data) {
> > +           x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
> > +           output<-c(output, x)
> > +                 }
> > + return(output)
> > + }
> >
> >
> >  result<-cover_per(cover)
> >
> >
> > Thanks so much!
> >
> >
> > --
> > Sincerely,
> > Changbin
> > --
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856
-------------- next part --------------
Contig79:1	4
Contig79:2	8
Contig79:3	13
Contig79:4	14
Contig79:5	17
Contig79:6	20
Contig79:7	25
Contig79:8	27
Contig79:9	32
Contig79:10	33
Contig79:11	34
Contig79:12	36
Contig79:13	39
Contig79:14	40
Contig79:15	44
Contig79:16	49
Contig79:17	55
Contig79:18	56
Contig79:19	59
Contig79:20	60
Contig79:21	62
Contig79:22	64
Contig79:23	64
Contig79:24	68
Contig79:25	68
Contig79:26	68
Contig79:27	70
Contig79:28	73
Contig79:29	76
Contig79:30	77
Contig79:31	78
Contig79:32	78
Contig79:33	79
Contig79:34	80
Contig79:35	80
Contig79:36	84
Contig79:37	87
Contig79:38	87
Contig79:39	88
Contig79:40	88
Contig79:41	89
Contig79:42	93
Contig79:43	94
Contig79:44	98
Contig79:45	99
Contig79:46	99
Contig79:47	102
Contig79:48	103
Contig79:49	108
Contig79:50	112
Contig79:51	112
Contig79:52	113
Contig79:53	116
Contig79:54	118
Contig79:55	120
Contig79:56	124
Contig79:57	124
Contig79:58	126
Contig79:59	128
Contig79:60	130
Contig79:61	133
Contig79:62	134
Contig79:63	136
Contig79:64	139
Contig79:65	144
Contig79:66	145
Contig79:67	146
Contig79:68	148
Contig79:69	149
Contig79:70	151
Contig79:71	156
Contig79:72	157
Contig79:73	158
Contig79:74	159
Contig79:75	159
Contig79:76	159
Contig79:77	160
Contig79:78	160
Contig79:79	161
Contig79:80	163
Contig79:81	164
Contig79:82	165
Contig79:83	165
Contig79:84	165
Contig79:85	165
Contig79:86	166
Contig79:87	170
Contig79:88	170
Contig79:89	172
Contig79:90	174
Contig79:91	178
Contig79:92	180
Contig79:93	181
Contig79:94	184
Contig79:95	184
Contig79:96	187
Contig79:97	190
Contig79:98	192
Contig79:99	194
Contig79:100	199


More information about the R-help mailing list