library(BSgenome.Rnorvegicus.UCSC.rn4)# get genome

fl <- "ftp://mirbase.org/pub/mirbase/CURRENT/genomes/rno.gff" # get miR coords
gff <- read.table(fl) # as dataframe

names <- gff[,10]
nms <- gsub(";", "", gsub("\"", "", gsub("ID=\"", "", names))) # a set of nested gsub with regex to leave only the text in the double quotes

gr <- GRanges(seqnames = Rle(paste('chr', gff[,1], sep='')), ranges = IRanges(gff[,4], end = gff[,5], names = nms), strand = Rle(gff[,7]))

seqs <- getSeq(Rnorvegicus, flank(gr, 200))
names(seqs) <- nms

It's much lighter on it's feet than a loop and a nice intro to the GenomicRanges package for me.

As a follow-up question I'm going to write out the seqs object as fasta and use it in clover for TFBS analysis. 

I was wondering whether the strand is taken into account when I get the flanking sequence i.e. is the sequence returned from the + or - strand as defined in the GRanges object? 

I only ask this because presumably that will affect my TFBS analysis and if so I might want to reverse / complement all those sequences that are upstream of miRs transcribed from the - strand.


