[BioC] GRanges nearest problem
Nishant Gopalakrishnan
ngopalak at fhcrc.org
Fri Apr 15 03:09:58 CEST 2011
Hi Arne,
Thank you for pointing out the error. I have checked in some changes to
fix this issue.
Nishant
On 04/14/2011 06:21 AM, Valerie Obenchain wrote:
> Hi Arne,
>
> Thanks for pointing out these bugs. I'll post again here when they
> have been fixed.
>
> Valerie
>
>
> On 04/13/11 05:29, Mueller, Arne wrote:
>> Hello,
>>
>> I've come across a problem in GRanges nearest, if subject of the
>> nearest call contains strand information (+/-) and the query does not
>> (*), the method takes a long time to run and raises warnings:
>>
>> mm9.pro.gr and mm9.2ktiles.gr are both Granges objects.
>>
>>> strand(mm9.pro.gr) = "-"
>>> strand(mm9.2ktiles.gr) = "*"
>>> system.time(nn<- nearest(mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>> user system elapsed
>> 27.150 0.002 27.416
>> There were 50 or more warnings (use warnings() to see the first 50)
>>> warnings()
>> Warning messages:
>> 1: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>> longer object length is not a multiple of shorter object length
>> 2: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>> longer object length is not a multiple of shorter object length
>> 3: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>> longer object length is not a multiple of shorter object length
>> 4: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>> longer object length is not a multiple of shorter object length
>> …
>>
>> I think if a range in either query or subject is non-stranded (*)
>> both, the method should look for the nearest neighbor ignoring the
>> strand (at least that's my suggestion ;-).
>>
>> If I set the strand info of the subject to "*" the method runs fine:
>>
>>> strand(mm9.pro.gr) = "*"
>>> system.time(nn<- nearest(mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>> user system elapsed
>> 0.264 0.000 0.264
>>
>> If the query is "stranded" (+/-) and the subject isn't, the method
>> runs fine, too (though longer as if both query and subject are
>> non-stranded, but I guess this can be expected):
>>
>>> system.time(nn<- nearest(mm9.pro.gr[1:5000],
>>> mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>> user system elapsed
>> 3.084 0.000 3.125
>>
>> Another odd behavior is that if the query contains sequence names not
>> contained in the subject an error is raised – the other way around
>> works fine. Wouldn't it make sense so set the vector elements of
>> sequences only found in the query to NA?
>>
>> Kind regards,
>>
>> Arne
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list