[BioC] NucleR: processReads/solveUserSEW0 - Error

Martin Morgan mtmorgan at fhcrc.org
Mon Nov 7 19:52:52 CET 2011


On 11/07/2011 02:53 AM, Stefanie Ververs wrote:
> Hi Martin,
>
> I just ran my script with another dataset but got the same error (even
> if on a different line).
> I read the description of readAligned and ScanBamParam, but I still
> don't get how to provide this parameters to readAligned. (I see that I
> can create an object/instance of ScanBamParam, but then? Is it something
> like "alignedReads <- readAligned(dir, pattern=filename, type="BAM",
> param=myScanBamParamObject)? There is no example and I am *really* new

yes, for example

   dir <- system.file("extdata", package="Rsamtools")
   param <- ScanBamParam(simpleCigar=TRUE,, reverseComplement=TRUE)
   aln <- readAligned(dir, "ex1.bam$", type="BAM", param=param)

> to R, sorry ;) Then I would try parsing the file with more specified
> arguments..
>
> I got some more information about my file/dataset:
>
> 17709538 + 0 in total (QC-passed reads + QC-failed reads)
> 0 + 0 duplicates
> 120774 + 0 mapped (0.68%:-nan%)
> 17709538 + 0 paired in sequencing
> 8854769 + 0 read1
> 8854769 + 0 read2
> 120774 + 0 properly paired (0.68%:-nan%)
> 120774 + 0 with itself and mate mapped
> 0 + 0 singletons (0.00%:-nan%)
> 0 + 0 with mate mapped to a different chr
> 0 + 0 with mate mapped to a different chr (mapQ>=5)
>
>
> And the error is (this time) in row /120775./
> /(Fehler in solveUserSEW0(start = start, end = end, width = width) :
> solving row 120775: range cannot be determined from the supplied
> arguments (too many NAs)
> Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call/)

Again it appears to be the last record. Can you provide a simpler 
example? For instance, from your original message you said that you 
input the data as

alignedReads <- readAligned(dir, pattern=filename, type="BAM")

but that the error occurred at

reads_pair = processReads(nucleosome_htseq, type="paired",
fragmentLen=fragment_len)

but there is no obvious connection between 'alignedReads' and the 
arguments to 'processReads'. Also, what is the output of

   idx = is.na(position(alignedReads)) & is.na(width(alignedReads))
   sum(idx)

? If sum(idx) != 0, then try using alignedReads[!idx]. And finally, after

   library(ShortRead)
   library(nucleR)

what is the result of

   sessionInfo()

?

>
>
>
> On 03.11.2011 14:39, Martin Morgan wrote:
>> On 11/03/2011 06:14 AM, Oscar Flores wrote:
>>> So this error happens here, no?
>>>
>>> res = RangedData(IRanges(start=position(ar),width=width(ar)),
>>> strand=strand(ar),space=ar at chromosome)
>>
>> better to use the accessor chromosome(ar). The error
>>
>> > Fehler in solveUserSEW0(start = start, end = end, width = width) :
>> > solving row 16512893: range cannot be determined from the supplied
>> > arguments (too many NAs)
>>
>> suggests that position(ar)[16512893] and / or width(ar)[16512893] is
>> NA. You could filter these out, e.g., ar[!is.na(position(ar)) &
>> !is.na(position(ar))] or identify why these are read in in the first
>> place using the 'param' argument as described on ?readAligned.
>>
>> Martin
>>
>>>
>>> If this is the case the problem is not in nucleR, maybe there are some
>>> rows in a strange format in the AlignedRead (could be due the multiple
>>> format changes) that may avoid the conversion to the RangedData. Maybe I
>>> can detect them and skip those cases, but I would need to see what's
>>> happening in that odd case.
>>>
>>> Let me know if there's something I can do.
>>>
>>> Regards,
>>>
>>> Oscar
>>>
>>>
>>> El 03/11/2011 13:58, Stefanie Ververs escribió:
>>>> Hi Oscar,
>>>>
>>>> thanks for your quick answer - I think I would have contacted you, if
>>>> there were no answers on the bioconductor-mailinglist.
>>>>
>>>> I just tried the workaround as you suggested, but I got the same error
>>>> again:
>>>> Fehler in solveUserSEW0(start = start, end = end, width = width) :
>>>> solving row 16512893: range cannot be determined from the supplied
>>>> arguments (too many NAs)
>>>> Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call
>>>>
>>>> I'll think about how to show you the data (it's hosted and processed
>>>> with Galaxy, so it might be possible to share it.)
>>>>
>>>> Regards,
>>>>
>>>> Steffi
>>>>
>>>> On 03.11.2011 13:16, Oscar Flores wrote:
>>>>> Hi Stefanie,
>>>>>
>>>>> I'm the developer of nucleR, so let's see if I can help you ;)
>>>>>
>>>>> After the processing, processReads converts the input data to a
>>>>> RangedData
>>>>> object for a easier manipulation later, so this error is occurs at
>>>>> the last
>>>>> step of the call, but data can be messed in previous steps. It's
>>>>> hard to tell what is happening without having a look to the input
>>>>> data,
>>>>> which I guess is huge...
>>>>>
>>>>> I would like to have a look to the raw data, but I know it is
>>>>> difficult
>>>>> to send it if it's not in a public repository. Maybe you can
>>>>> contact me
>>>>> directly about that (oflores at mmb.pcb.ub.es)...
>>>>>
>>>>> Meanwhile, if you want to try a workaround, you can directly
>>>>> convert the
>>>>> imported reads to RangedData format (which is the other format
>>>>> supported
>>>>> by processReads):
>>>>>
>>>>> (being "ar" your imported AlignedReads object)
>>>>>
>>>>> res = RangedData(IRanges(start=position(ar),width=width(ar)),
>>>>> strand=strand(ar),space=ar at chromosome)
>>>>>
>>>>> reads_pair = processReads(res, type="paired",
>>>>> fragmentLen=fragment_len)
>>>>>
>>>>> This should work, but will be nice to have a look to your data
>>>>> to fix a possible problem in the AlignedReads method.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Oscar
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>
>>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list