[Bioc-sig-seq] about N statistics
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Sep 6 22:42:14 CEST 2011
Hi,
Are you looking for the number of reads that have 0, 1, ..., X 'N's in them?
If so, you can stop here:
On Tue, Sep 6, 2011 at 4:22 PM, wang peter <wng.peter at gmail.com> wrote:
> i used a stupid way to do statistics on the reads distribution varied with N
> number
>
> library(ShortRead)
> reads <- readFastq(fastqfile);
> ids<- id(reads);
> seqs <- sread(reads);
> # do you know how to get such information by a bioconductor function
> nCount<-alphabetFrequency(seqs)[,"N"]
And do:
R> n.distro <- table(nCount)
or some such, I think.
But it seems like you should also have the same answer in nCountHist,
as you've done it below, no?
> nCountHist<-hist(nCount,breaks=max(nCount))
> nCountHist["breaks"]
> nCountHist["counts"]
If that's not what you need, then maybe you can be a bit more specific
about what you are after?
-steve
> $breaks
> [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
> 24
> [26] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
> 49
> [51] 50 51 52 53 54 55
>> nCountHist["counts"]
> $counts
> [1] 16988332 3975 4365 3099 2760 2473 2918 3045
> [9] 3320 3028 3290 3560 4695 4546 3939 4255
> [17] 3899 4025 6764 3554 4056 2716 1812 1456
> [25] 1618 2133 2253 1809 1638 924 951 889
> [33] 931 1089 1868 3344 348 36 20 25
> [41] 12 16 10 24 9 4 4 3
> [49] 0 0 3 1 1 0 1
>
> what i need is just the count of reads varied with "N" number, like such
> above
>
> thx
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioc-sig-sequencing
mailing list