[Bioc-sig-seq] gdapply, coverage, and widths for all chromosomes

Martin Morgan mtmorgan at fhcrc.org
Mon Oct 19 22:01:12 CEST 2009


Michael Lawrence wrote:
> On Mon, Oct 19, 2009 at 10:25 AM, Chris Seidel <seidel at phaget4.org> wrote:
> 
>> In using gdapply to get coverage over all chromosomes of a
>> GenomeDataList object:
>>
>> cov <- gdapply(extendedReads, coverage)

a different route to this point (<...> representing arguments you'd
provide) is

  aln <- readAligned(<...>) # or aln <- AlignedRead(<...>)
  library(BSgenome.Hsapiens.UCSC.hg18)
  wd <- seqlengths(Hsapiens)
  ## subset aln and wd so that they have matching names, e.g.,
  ## it might also be necessary to adjust names, e.g,. dropping '.fa'
  aln <- aln[chromosome(aln) %in% names(wd)]
  wd <- wd[names(wd) %in% chromosome(aln)]
  cvg <- coverage(aln, width=wd, extend=200L)

this also results in a list of Rle's (a SimpleRleList). See
?"AlignedRead-class" for details; this is a non-trivial operation, e.g.,
it maybe useful to separately manage strands.

Martin


>>
>> how do you supply the "width" argument to coverage? When I try handing
>> it a vector of chromosome lengths, it complains that it needs a single
>> value.
>>
>>
> This is the normal behavior of apply functions in R. The additional
> arguments are passed in whole to the function every time it is invoked. We
> don't have an "mapply" variant of gdapply, so you'll need to do some extra
> work, like keeping track of an index.
> 
> Perhaps we could make this easier using metadata? We know the genome of
> 'extendedReads' so 'coverage' could be smart and refer to the appropriate
> BSgenome package.
> 
> Michael
> 
> 
> 
>> -Chris
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list