[BioC] QuasR: how to use an indexed reference genome?

Michael Stadler michael.stadler at fmi.ch
Tue May 21 13:41:58 CEST 2013


Dear Paul and Tim,

It was our intention that lazy people (including ourselves) do not have
to worry about index generation. Contrary to what you write, Paul, it is
however not true that QuasR will build the genome index without
informing the user. In such cases, it says:

  alignment files missing - need to:
      create alignment index for the genome
      [...]

and gives you a countdown of 10 seconds, during which you can interrupt
the process if it is not what you intended. As mentioned before, we have
added a paragraph to the qAlign documentation detailing the storage
location of that index, which you can fully control.

The major reason for the current interface design is that QuasR must
retain a consistent state of your project, and prevent users from
unintentionally mixing incompatible genome sequence and aligner index.
This seems to be contrary to your expectations, maybe because current
aligners require genome indices explicitly. We believe that for most
(less experienced) users, it is safer and more comfortable to let QuasR
hide some of the technical details underneath its cover. We consciously
chose this design, which allows even users who don't know that the
genome needs to be indexed, or how this needs to be done, to safely use
QuasR.

This creates an additional layer of abstraction compared to a direct
analysis, e.g. on the command line. For users preferring to explicitly
specify *how* things should be done, rather than *what* should be done,
QuasR might not be the right tool.

- Michael and Dimos



On 18.05.2013 01:53, Tim Triche, Jr. wrote:
> same here, I would like to use QuasR more often, the reason I use
> Rsubread (instead of say gmapR or quasR) is simply that I know where my
> indexed genomes are and how to regenerate them if need be
> 
> this is a big deal for lazy people like me ;-)
> 
> 
> 
> On Fri, May 17, 2013 at 7:51 AM, Paul Shannon
> <paul.thurmond.shannon at gmail.com
> <mailto:paul.thurmond.shannon at gmail.com>> wrote:
> 
>     Thanks, Michael.
> 
>     May I cast my one vote, as a new user of Quasr, for an extra measure
>     of transparency?  I have lost a few days to my confusion.
> 
>     My reasoning -- which might be idiosyncratic, and which you may find
>     unconvincing -- is that if qAlign, in some invocations (but not all)
>     needs to spend a few hours creating an index, and writes it to a
>     place which is not entirely obvious (lib.loc), then I as a user am
>     mightily confused.  As I have been! :}
> 
>     I would have been much better off  if qAlign had told me:
> 
>       1) you specified a genome package by name for your reference
>       2) in order to use that, qAlign needs for it to be indexed
>       3) we cannot find an indexed version of that genome in your R
>     library or any lib.loc directory
>       4) please specify an alternative lib.loc, and explicit path, or…
>       5) create a new index for your genome using this command:
>             buildIndexAsPackage(genomePackageName, destinationDir)  # or
>     some such method call
>       6) you can then specify that newly-built index package in
>     subsequent calls to qAlign this way:
>             qAlign(samplesFile,
>     genome="~/path/to/genome-index-package.tar.gz")
> 
>      - Paul
> 
>     On May 17, 2013, at 6:59 AM, Michael Stadler wrote:
> 
>     > Hi Paul,
>     >
>     > Please find my answers below:
>     >
>     > On 17.05.2013 15:01, Paul Shannon wrote:
>     >> Hi Michael,
>     >>
>     >> Thanks for your quick and clarifying response.
>     >>
>     >> Since it is not possible to use pre-built indices from the bowtie
>     developers, I would be glad to have a small recipe (perhaps featured
>     prominently in the vignette?) which
>     >>
>     >>  1) explains the need for custom-built indices
>     >>  2) provides (perhaps) a standalone QuasR-specific command for
>     creating one
>     > It is actually much simpler than you expect: qAlign() creates the
>     index
>     > automatically if it does not yet exist. The index is then saved in a
>     > default location (as a new R package if your reference is a
>     BSgenome, or
>     > else in the same directory containing the fasta reference), and
>     will be
>     > automatically re-used when qAlign is called with the same reference.
>     >
>     >> My somewhat fuzzy grasp of the current approach is that
>     >>
>     >>  1) QuasR sees the string "BSgenome.Hsapiens.UCSC.hg19" on a call
>     to qAlign
>     >>  2) QuasR then spends a few hours building a new package with the
>     proper index
>     > Yes, this is described in section 5.3 of the vignette.
>     >
>     >>  3) and saves this package somewhere (I could not figure out where)
>     > This is described in the documentation to qAlign. I agree that it
>     would
>     > be better to have this all described more explicitly in a single
>     place,
>     > so I added a description to the qAlign documentation (available
>     shortly
>     > in the development branch).
>     >
>     > I hope this makes it all clear.
>     >
>     > Best,
>     > Michael
>     >
> 
>     _______________________________________________
>     Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>     Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> 
> 
> -- 
> /A model is a lie that helps you see the truth./
> /
> /
> Howard Skipper
> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

-- 
--------------------------------------------
Michael Stadler, PhD
Head of Computational Biology
Friedrich Miescher Institute
Basel (Switzerland)
Phone : +41 61 697 6492
Fax   : +41 61 697 3976
Mail  : michael.stadler at fmi.ch



More information about the Bioconductor mailing list