[BioC] BLAST search sequence for species ID from R?

jos matejus matejus106 at googlemail.com
Mon Dec 14 12:02:02 CET 2009


Dear list members,

A colleague has asked whether I can help him with a bioinformatics
problem he has as he knows I use R (although I don't usually use R for
this type of problem) and I was hoping someone might be able to point
me in the right direction. I have searched the mailing list archives
and also Googled this particular query, but without success. I ask
forgiveness in advance if the question is not appropriate for this
forum.

Anyway, the background is that my colleague has a sample collected
from the field containing many species of related insects (same genus)
which he has obtained lots of sequence information (from 454). The
sequences are saved in a single fasta file. What he wants to do is to
query Genbank to match each sequence from the fasta file to particular
species (A nucleotide blast search I believe) and return the top
ranked match for each sequence. He can do this manually via the web
page, but he will have a lot of these files in the future and was
looking for some way of automating the process (hence using R). He
ultimately wants to be able to restrict the Blast search to a list of
preselected  Accession numbers or within genus.

As I am not familiar with this field I was wondering whether anyone
knows of an existing function (or functions) that can do the job. I am
looking at the package seqinr at the moment to see whether this would
fit the bill and also whether the Biostrings package would be
appropriate. However, the learning curve looks a little steep and I
wanted to make sure I was going down the right road before investing
lots of time.

Also, is there a package that I can use to access the Genbank database
directly from within R to do the Blast searches?

Many many thanks in advance
Jos



More information about the Bioconductor mailing list