[Bioc-sig-seq] ShortRead sequences

Martin Morgan mtmorgan at fhcrc.org
Thu Feb 11 20:22:49 CET 2010


On 02/11/2010 10:15 AM, Droit Arnaud wrote:
> Hello Everyone,
> 
> We are developing a new package for ChiP-Seq analysis. We use the
ShorRead package to import the short read data, e.g.
> 
> data<readAligned("s_1_sequence.maq.map",type="MAQMap")
> 
> However, all data information are included in the R alignedRead
> object
(data : sequences, start, stop, strand, etc), but in ChIP-Seq we do not
really need the short read sequences, only their position and strand
information. Is there a more direct/efficient way to do it? We know that
we can convert/coerce the alignedRead into a RangedData or a similar
object but this is not very efficient. Our experience is that the
readAligned function can be very demanding both in memory and time when
reading large datafiles even with 32G of RAM (on a 64bit R version), and
we think that this is mostly due to the sequences being read.
> We are wondering if there is any available filters to include only
> the
start, the end and the strand or perhaps a different R object, R
function that we should use?
> 

Hi Arnaud --

There is no support in ShortRead for reading in only a portion of a MAQ
map file.

Rsamtools (svn only at the moment
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Rsamtools;
see http://wiki.fhcrc.org/bioc/SvnHowTo, but the intention is that it
will be available in the next release) can read specific sections of BAM
files, and samtools supports map -> bam conversion

> ?ScanBamParam
> scanBamWhat()
 [1] "qname"  "flag"   "rname"  "strand" "pos"    "qwidth" "mapq"   "cigar"
 [9] "mrnm"   "mpos"   "isize"  "seq"    "qual"

  param <- scanBamParam(what=c("rname", pos", "strand", "qwidth"),
                        flag=scanBamFlag(isUnmappedQuery=TRUE),
                        isSimpleCigar=TRUE)
  bam <- scanBam("file.bam", param=param)

this is a list of lists. The outer list is not relevant when there is no
'which' argument to scanBamParam, so bam[[1]] gives a list of the 'what'
vectors,

Are you anticipating calling something like 'coverage' as an immediate
first step? We have been thinking about implementing high-level command
like  coverage("file.bam", param=param), and it would be good to know
what the the use cases are.

Martin


> Thanks
> 
> Arnaud.
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list