[R] motif search
Jean lobry
lobry at biomserv.univ-lyon1.fr
Thu Dec 11 17:37:40 CET 2008
Dear Alessia,
> I am very new to R and wanted to know if there is a package that, given very
> long nucleotide sequences, searches and identifies short (7-10nt) motifs.. I
> would like to look for enrichment of certain motifs in genomic sequences.
>
> I tried using MEME (not an R package, I know), but the online version only
> allows sequences up to MAX 60000 nucleotides, and that's too short for my
> needs..
You may try this:
#
# Load the seqinr package:
#
library(seqinr)
#
# A FASTA file example - that ships with seqinr - which contains
# the complete genome sequence of Chlamydia trachomatis :
#
fastafile <- system.file("sequences/ct.fasta", package = "seqinr")
#
# Import the sequence as a string of characters:
#
myseq <- read.fasta(fastafile, as.string = TRUE)
nchar(myseq) # 1042519, that is a Mb sequence
#
# Look for motif "atatatat", with possible overlap:
#
words.pos("atatatat", myseq, extended = TRUE)
#
# This returns the posistions where the motif is found, that
# is : 236501 236503 283987 687083 792792 792794
#
substr(myseq, 236501, 236501 + 8)
#
# Should be
# [1] "atatatata"
#
HTH,
Jean
--
Jean R. Lobry (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo : +33 472 43 27 56 fax : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/
More information about the R-help
mailing list