[Bioc-sig-seq] potential bug either in strandFilter (ShortRead) or estimate.mean.fraglen (chipseq)

Fri Mar 5 16:06:36 CET 2010

I'm not sure that this addresses your problem but if you are using the  
devel version of short read then please update

-- 
Martin Morgan

On Mar 5, 2010, at 8:22 AM, Johannes Rainer <johannes.rainer at i- 
med.ac.at> wrote:

> dear all,
>
> I already reported that I have a problem using the  
> estimate.mean.fraglen
> function from the chipseq package using the method="correlation",  
> finally I
> think I found the reason for the problem:
>
> short example: I filter my data set to unique reads on chromosomes  
> 1-22, X
> and Y and filter also for reads on + and - strand.
>
>> chromFilter <- chromosomeFilter( "(^[1-9]|^X|^Y)" )
>> strFilter <- strandFilter( c("+", "-") )
>> theFilters <- compose( chromFilter, strFilter )
>> AlignedReads <- AlignedReads[[1]]
>> AlignedReads <- AlignedReads[ theFilters( AlignedReads ) ]
>> uFilter <- uniqueFilter( withSread=FALSE )
>> AlignedReads <- AlignedReads[ uFilter( AlignedReads ) ]
>> AlignedReads
> class: AlignedRead
> length: 17860091 reads; width: 32 36 cycles
> chromosome: 7 8 ... 1 15
> position: 131934390 3275112 ... 64185954 45913614
> strand: + + ... - -
> alignQuality: NumericQuality
> alignData varLabels: similar mismatch
>> unique( strand( AlignedReads ))
> [1] + -
> Levels: - + *
>
> this was actually the first surprise with the levels of strand being  
> - + *
> although the object does only contain reads on + and - strand. this  
> was then
> also the cause for the error in the  estimate.mean.fraglen function:
>
>> Test <- estimate.mean.fraglen( AlignedReads, method="correlation" )
> Error in if (from > from0 || to < to0) stop("[from, to] smaller than  
> support
> not implemented yet (but easy to add)") :
>  missing value where TRUE/FALSE needed
> Error in unlist(lapply(splitData, function(y) { :
>  error in evaluating the argument 'x' in selecting a method for  
> function
> 'unlist'
>
> by looking at the source code of the chipseq package I finally  
> realised that
> the 3 levels of the strand are the problem, since the  
> estimate.mean.fraglen
> function defined for AlignedRead classes splits the reads based on the
> strand (and "split" with "drop=FALSE" (the default) will split the  
> data set
> into 3 columns, one for each level):
>
> setMethod("estimate.mean.fraglen", "AlignedRead",
>          function(x, method = c("SISSR", "coverage",  
> "correlation"), ...) {
>              splitData <-
>                split(data.frame(strand = strand(x),
>                                 start =
>                                 ifelse(strand(x) == "-",
>                                        position(x) + width(x) - 1L,
>                                        position(x))),
>                      chromosome(x))
>              unlist(lapply(splitData,
>                            function(y) {
>                                estimate.mean.fraglen(split(y 
> [["start"]],
> ## splits reads
>                                                            y 
> [["strand"]]),
>                                                      method =  
> method, ...)
>                            }))
>           })
>
>
>
> I suggest the following solutions:
> 1) either reduce the levels of strand( AlignedReads ) to only + and  
> - after
> applying a strandFilter, or
> 2) use split with the option drop=TRUE in the estinmate.mean.fraglen
> function:
>
> ...
>                            function(y) {
>                                estimate.mean.fraglen(split(y 
> [["start"]],
>                                                            y 
> [["strand"]],
>                                                             
> drop=TRUE ),
>                                                      method =  
> method, ...)
>                            }))
> ...
>
>
>
> best regards, jo
>
>
> -- 
> Johannes Rainer, PhD
> Bioinformatics Group,
> Division Molecular Pathophysiology,
> Biocenter, Medical University Innsbruck,
> Fritz-Pregl-Str 3/IV, 6020 Innsbruck, Austria
> and
> Tyrolean Cancer Research Institute
> Innrain 66, 6020 Innsbruck, Austria
>
> Tel.:     +43 512 570485 13
> Email:  johannes.rainer at i-med.ac.at
>           johannes.rainer at tcri.at
> URL:   http://bioinfo.i-med.ac.at
>
>    [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing