[Bioc-sig-seq] trimLRPatterns; matching pattern from pos -1 gives solveUserSEW error

Harris A. Jaffee hj at jhu.edu
Wed Feb 16 23:46:58 CET 2011


Sorry, I retract my proposed fix.  It would break the opposite case:
	
	trimLRPatterns(subject="A", Lpattern="AC", max.Lmismatch=1)

I lean toward trusting the comments on .computeTrimStart  
and .computeTrimEnd,
meaning that the code should adhere to them -- perhaps as follows.

The last line of .computeTrimStart becomes

	pmin(pattern_length + 2L - ii, width(subject) + 1L)

and the last line of .computeTrimEnd becomes

	pmax(subject_width - pattern_length - 1L + ii, 0L)

No doubt I've still missed something.

On Feb 16, 2011, at 3:22 PM, Harris A. Jaffee wrote:
> This will be a whole lot of detail in lieu of a final fix (proposal  
> below).
>
> I don't believe the authors, certainly not I, envisioned your  
> situation, which I think I can
> abstract correctly here:
>
> 	library(Biostrings)
> 	trimLRPatterns(Rpattern="CA", subject="A", max.Rmismatch=1)	# same  
> as max.Rmismatch=c(-1,1)
>
> To have loaded ShortRead was irrelevant and perhaps confusing.
>
> Now, what should happen with the call above, from your point of view?
>
> The first test is for the whole Rpattern "ending at the right end  
> of the subject", thus extending
> off the left end of the subject by 1 letter as you say.  For a  
> match, we need to allow for at least
> 1 error, and that is sufficient because the rest of the Rpattern is  
> the same as the subject.  Note
> that this 1 goes in the *last* element of the max.Rmismatch  
> vector.  So, this test should succeed,
> the entire subject should get trimmed, and your result should be "".
>
> To gain insight, you have
>
> 	showMethods("trimLRPatterns", includeDefs=TRUE)
>
> That might lead you to make your trimLRPatterns call under
>
> 	debug(Biostrings:::.XStringSet.trimLRPatterns)
>
> The abort
>
>   Error in solveUserSEW(width(x), start = start, end = end, width =  
> width) :
>   solving row 1: 'allow.nonnarrowing' is FALSE and the supplied  
> start (0) is < 1
>
> clearly occurs in the call
>
> 	narrow(subject, start=0, end=-1)
>
> at the end of .XStringSet.trimLRPatterns, but who is at fault?
>
> Well, Biostrings:::.computeTrimEnd() returned the -1, violating  
> this comment:
>
> 	### 'subject' must be an XStringSet object of length != 0.
> 	### Returns an integer vector of the same length as 'subject'  
> where the i-th
> 	### value is guaranteed to be >= 0 and <= width(subject)[i].
>
> I think -1 is correct, and the comment is false [see my claim about  
> the authors].
>
> Then, just before the call to narrow(), we have this  
> in .XStringSet.trimLRPatterns:
>
>     ## For those invalid ranges where 'start > end + 1L', we  
> arbitrarily
>     ## decide to set the 'start' to 'end + 1' (another reasonable  
> choice
>     ## would have been to set the 'end' to 'start - 1').
>
> With end=-1, start=1 (from .computeTrimStart, correctly) gets  
> changed to end+1 = 0.
>
> In this situation, the other choice would have fixed your problem.   
> I don't think it
> would create any other problems.
>
> -Harris
>
> On Feb 16, 2011, at 8:12 AM, Ludo Pagie wrote:
>
>> Hi all,
>>
>> I'm trimming reads using an Rpattern which in some cases extends  
>> on the left side. If the pattern starts at exactly position -1  
>> trimLRPattern throws an error. If the pattern starts further  
>> 'upstream' it returns a zero length sequence, as expected:
>>
>> library(ShortRead)
>> # create a pattern
>> fragment <- paste(sample(c('A','C','G','T'), 
>> 10,replace=TRUE),collapse='')
>> # create some reads based on the pattern; for different reads the
>> # pattern extends either on the left, the right, or both sides
>> reads <- substring(fragment,1:4,7:10)
>>
>> # trim all reads; all reads should match the pattern fully and be
>> # trimmed from start to end
>> trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads),  
>> max.Rmismatch=0.5, ranges=TRUE, Rfixed=FALSE)
>>
>> # IRanges of length 4
>> #     start end width
>> # [1]     1   0     0
>> # [2]     0  -1     0
>> # [3]    -1  -2     0
>> # [4]    -2  -3     0
>>
>> # if I want the reads to be trimmed right away an error is thrown for
>> # the second read
>> trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads[c 
>> (1,3,4)]), max.Rmismatch=0.5, ranges=FALSE)
>> #   A DNAStringSet instance of length 3
>> #     width seq
>> # [1]     0
>> # [2]     0
>> # [3]     0
>>
>> trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads[2]),  
>> max.Rmismatch=0.5, ranges=FALSE)
>>
>> # Error in solveUserSEW(width(x), start = start, end = end, width =
>> # width) :
>> #   solving row 1: 'allow.nonnarrowing' is FALSE and the supplied  
>> start
>> # (0) is < 1
>>
>>
>> > sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: i686-pc-linux-gnu (32-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] ShortRead_1.8.2     Rsamtools_1.2.1     lattice_0.19-13
>> [4] Biostrings_2.18.2   GenomicRanges_1.2.2 IRanges_1.8.7
>> [7] multtest_2.6.0      Biobase_2.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] grid_2.12.0     hwriter_1.3     MASS_7.3-9      splines_2.12.0
>> [5] survival_2.36-2 tools_2.12.0
>>
>>
>>
>> Where is this error coming from?
>>
>> Ludo
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list