[Bioc-sig-seq] BLAST from within R

Martin Morgan mtmorgan at fhcrc.org
Thu Sep 30 21:25:25 CEST 2010


On 09/30/2010 10:53 AM, Ivan Gregoretti wrote:
> Hello everybody,
> 
> How do you perform a BLAST alignment of 1 sequence within R?
> Any pointer would be appreciated.
> 
> The sequence in question is about 40kb and the pertinent genome is mm9.

Hi Ivan

One possibility comes from

  http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

where one could load XML

  library(XML)

then formulate a query

  baseUrl <- "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"
  query <- paste("QUERY=555&DATABASE=nr&HITLIST_SIZE=10&FILTER=L",
                 "&EXPECT=10&PROGRAM=blastn", sep="")
  url0 <- sprintf("%s?%s&CMD=Put", baseUrl, query)

Then submit the query and extract the return result id (RID) using XML
and XPATH queries (see http://www.w3.org/TR/xpath/, especially section 2.5)

  post <- htmlTreeParse(url0, useInternalNodes=TRUE)
  rid <- xpathApply(post, '//input[@name="RID"][@id="rid"]',
                    xmlAttrs)[[1]][["value"]]

and then finally retrieve the result

  url1 <- sprintf("%s?RID=%s&FORMAT_TYPE=XML&CMD=Get", baseUrl, rid)
  result <- xmlTreeParse(url1, useInternalNodes=TRUE)

what to do now depends on the info you want out, e.g.,

  qseq <- xpathApply(result, "//Hsp_qseq", xmlValue)
  hseq <- xpathApply(result, "//Hsp_hseq", xmlValue)
  midline <- xpathApply(result, "//Hsp_midline", xmlValue)
  for (i in 1:5)
      cat(qseq[[i]], hseq[[i]], midline[[i]], sep="\n")

Might be fun (though a bit of work, to cast the alignments onto common
reference coordinates) to use this in the new
Biostrings::DNAMultipleAlignment constructor.

Martin

> 
> Thank you,
> 
> Ivan
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list