[BioC] kmer and zscore calculation
Hervé Pagès
hpages at fhcrc.org
Thu Dec 19 19:33:04 CET 2013
[Oops, forgot to Cc the list when I answered this. Sending it again...]
Hi Fabrice,
The oligonucleotideFrequency() function in the Biostrings
package counts the nb of occurrences of all possible 5-mers
(use 'width=5') or 6-mers (use 'width=6'). You need to store
your sequence(s) in a DNAString or DNAStringSet object first.
On a DNAStringSet, the counts are returned in a matrix with 1
row per sequence and 1 column per k-mer:
library(Biostrings)
library(hgu95av2probe)
probes <- DNAStringSet(hgu95av2probe)
count5 <- oligonucleotideFrequency(probes, width=5)
Then:
> dim(count5)
[1] 201800 1024
> count5[1:6, 1:10]
AAAAA AAAAC AAAAG AAAAT AAACA AAACC AAACG AAACT AAAGA AAAGC
[1,] 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0
Maybe this function should have been called kmerFrequency()...
Cheers,
H.
On 12/17/2013 10:28 AM, Fabrice Tourre wrote:
> Dear list,
>
> I have a list of bed regions. each region is 10bp length. I want to
> calculate the hexamers and pentamers in theses regions and get the
> zscore. Is there any existed packages to do this?
>
> Thank you very much in advance.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list