[Bioc-sig-seq] potential bug either in strandFilter (ShortRead) or estimate.mean.fraglen (chipseq)
Martin Morgan
mtmorgan at fhcrc.org
Fri Mar 5 16:06:36 CET 2010
I'm not sure that this addresses your problem but if you are using the
devel version of short read then please update
--
Martin Morgan
On Mar 5, 2010, at 8:22 AM, Johannes Rainer <johannes.rainer at i-
med.ac.at> wrote:
> dear all,
>
> I already reported that I have a problem using the
> estimate.mean.fraglen
> function from the chipseq package using the method="correlation",
> finally I
> think I found the reason for the problem:
>
> short example: I filter my data set to unique reads on chromosomes
> 1-22, X
> and Y and filter also for reads on + and - strand.
>
>> chromFilter <- chromosomeFilter( "(^[1-9]|^X|^Y)" )
>> strFilter <- strandFilter( c("+", "-") )
>> theFilters <- compose( chromFilter, strFilter )
>> AlignedReads <- AlignedReads[[1]]
>> AlignedReads <- AlignedReads[ theFilters( AlignedReads ) ]
>> uFilter <- uniqueFilter( withSread=FALSE )
>> AlignedReads <- AlignedReads[ uFilter( AlignedReads ) ]
>> AlignedReads
> class: AlignedRead
> length: 17860091 reads; width: 32 36 cycles
> chromosome: 7 8 ... 1 15
> position: 131934390 3275112 ... 64185954 45913614
> strand: + + ... - -
> alignQuality: NumericQuality
> alignData varLabels: similar mismatch
>> unique( strand( AlignedReads ))
> [1] + -
> Levels: - + *
>
> this was actually the first surprise with the levels of strand being
> - + *
> although the object does only contain reads on + and - strand. this
> was then
> also the cause for the error in the estimate.mean.fraglen function:
>
>> Test <- estimate.mean.fraglen( AlignedReads, method="correlation" )
> Error in if (from > from0 || to < to0) stop("[from, to] smaller than
> support
> not implemented yet (but easy to add)") :
> missing value where TRUE/FALSE needed
> Error in unlist(lapply(splitData, function(y) { :
> error in evaluating the argument 'x' in selecting a method for
> function
> 'unlist'
>
> by looking at the source code of the chipseq package I finally
> realised that
> the 3 levels of the strand are the problem, since the
> estimate.mean.fraglen
> function defined for AlignedRead classes splits the reads based on the
> strand (and "split" with "drop=FALSE" (the default) will split the
> data set
> into 3 columns, one for each level):
>
> setMethod("estimate.mean.fraglen", "AlignedRead",
> function(x, method = c("SISSR", "coverage",
> "correlation"), ...) {
> splitData <-
> split(data.frame(strand = strand(x),
> start =
> ifelse(strand(x) == "-",
> position(x) + width(x) - 1L,
> position(x))),
> chromosome(x))
> unlist(lapply(splitData,
> function(y) {
> estimate.mean.fraglen(split(y
> [["start"]],
> ## splits reads
> y
> [["strand"]]),
> method =
> method, ...)
> }))
> })
>
>
>
> I suggest the following solutions:
> 1) either reduce the levels of strand( AlignedReads ) to only + and
> - after
> applying a strandFilter, or
> 2) use split with the option drop=TRUE in the estinmate.mean.fraglen
> function:
>
> ...
> function(y) {
> estimate.mean.fraglen(split(y
> [["start"]],
> y
> [["strand"]],
>
> drop=TRUE ),
> method =
> method, ...)
> }))
> ...
>
>
>
> best regards, jo
>
>
> --
> Johannes Rainer, PhD
> Bioinformatics Group,
> Division Molecular Pathophysiology,
> Biocenter, Medical University Innsbruck,
> Fritz-Pregl-Str 3/IV, 6020 Innsbruck, Austria
> and
> Tyrolean Cancer Research Institute
> Innrain 66, 6020 Innsbruck, Austria
>
> Tel.: +43 512 570485 13
> Email: johannes.rainer at i-med.ac.at
> johannes.rainer at tcri.at
> URL: http://bioinfo.i-med.ac.at
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list