[BioC] kmer and zscore calculation
Fabrice Tourre
fabrice.ciup at gmail.com
Thu Dec 19 19:45:14 CET 2013
Thank you very much. It is helpful.
On Thu, Dec 19, 2013 at 1:33 PM, Hervé Pagès <hpages at fhcrc.org> wrote:
> [Oops, forgot to Cc the list when I answered this. Sending it again...]
>
>
> Hi Fabrice,
>
> The oligonucleotideFrequency() function in the Biostrings
> package counts the nb of occurrences of all possible 5-mers
> (use 'width=5') or 6-mers (use 'width=6'). You need to store
> your sequence(s) in a DNAString or DNAStringSet object first.
> On a DNAStringSet, the counts are returned in a matrix with 1
> row per sequence and 1 column per k-mer:
>
> library(Biostrings)
> library(hgu95av2probe)
> probes <- DNAStringSet(hgu95av2probe)
> count5 <- oligonucleotideFrequency(probes, width=5)
>
> Then:
>
> > dim(count5)
> [1] 201800 1024
> > count5[1:6, 1:10]
> AAAAA AAAAC AAAAG AAAAT AAACA AAACC AAACG AAACT AAAGA AAAGC
> [1,] 0 0 0 0 0 0 0 0 0 0
> [2,] 0 0 0 0 0 0 0 0 0 0
> [3,] 0 0 0 0 0 0 0 0 0 0
> [4,] 0 0 0 0 0 0 0 0 0 0
> [5,] 0 0 0 0 0 0 0 0 0 0
> [6,] 0 0 0 0 0 0 0 0 0 0
>
> Maybe this function should have been called kmerFrequency()...
>
> Cheers,
> H.
>
>
> On 12/17/2013 10:28 AM, Fabrice Tourre wrote:
>>
>> Dear list,
>>
>> I have a list of bed regions. each region is 10bp length. I want to
>> calculate the hexamers and pentamers in theses regions and get the
>> zscore. Is there any existed packages to do this?
>>
>> Thank you very much in advance.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
More information about the Bioconductor
mailing list