[Bioc-devel] GenomicRanges::findOverlaps() ignoring chromosome information?

Kevin Rue-Albrecht kevin.rue at ucdconnect.ie
Fri Sep 19 14:18:50 CEST 2014


Hi all, for this concluding email !

I found the problem in my code:
Everything was in the right place, except that I initialised the column
meant to store the chromosome name with NA values (DMRs without hits will
be left with this NA if the users requires all DMRs in the return value).
When I subsquently inserted the chromosome name for the DMRs hitting an
annotated gene, the character value was then converted in a numeric value
because a column initialised with NA is of class "logical". This is where
the actual chromosome name was converted to a numeric value, often
different from the original chromosome name. When I subsequently prefixed
that value with "chr", converting that column to the class character, there
was no trace of the undesired conversion left.

Anyway, for those interested, I attach the two functions I wrote (and
corrected):

   - OverlapDmrs.Gene
   - Takes the output data.frame "dmrs" from bsseq, a GRanges object
      obtained form a UCSC gene track, and some opotional arguments
      - To find DMRs overlapping annotated genes, and return a table with
      the coordinates and Ensembl identifier of that gene
   - OverlapDmrs.Cpg
   - Same as above, except expects a GRanges object from a UCSC cpg track
      - Annotates with the coordinates of an overlapping CpG island

I also attached a example data.frame of dmrs obtained using bsseq, as
described in my first email. I believe all the code is there to test. Feel
free to give me feedback on this.

Apologies for the spam and the relatively obvious mistake on my part.

Cheers
Kevin





On 19 September 2014 12:21, Kevin Rue-Albrecht <kevin.rue at ucdconnect.ie>
wrote:

> Hi again,
>
> Update on my issue, although I haven't found the source of the error yet..
> I have correct overlaps in one scenario, but not in another.  This suggests
> that the findOverlaps() command works as expected on my data, but in the
> second scenario I don't see where the error is yet, let me explain:
>
>    - When I use my function OverlapDmrs.Gene with argument only.hits=TRUE,
>    all the hits make perfect sense
>       - Full command: dmrs_gene = OverlapDmrs.Gene(dmrs=dmrs,
>       gene_track=ensGene.asFeatures, only.hits=TRUE, prefix.chr=TRUE)
>    - When I use my function OverlapDmrs.Gene with argument only.hits=FALSE,
>    the correct DMRs are annotated with the right start and stop position, but
>    with an incorrect chromosome value (strangest thing is that chromosone 30
>    should not exist in *Bos taurus*, while some hits state this value in
>    the chromosome column)
>       - Full command: dmrs_gene.all = OverlapDmrs.Gene(dmrs=dmrs,
>       gene_track=ensGene.asFeatures, only.hits=FALSE, prefix.chr=T)
>
>
> ...
> Now that I wrote that "out loud", I just got an idea where to look for the
> source of the problem. Apologies for the spam, but if I find the solution,
> I'll definitely bring a conclusion to this thread.
>
> Kevin
>
>
>
>
>
>
> On 19 September 2014 10:12, Kevin Rue-Albrecht <kevin.rue at ucdconnect.ie>
> wrote:
>
>> Dear maintainer, Dear all,
>>
>> *Situation*
>> I have used the findOverlaps(function) to annotate differentially
>> methylated regions (DRMs) obtained using the bsseq Bioconductor package in
>> the *Bos taurus* genome. (No, you won't steal my experimental design :-P
>> ).
>> I used the genome UMD3.1.75 as a reference for my analysis.
>>
>> *Problem*
>> The genes found to overlap the DMRs genomic ranges are often on a
>> different chromosone than the DMR, although the start and end coordinate of
>> DMRs and gene do overlap in all cases.
>> This leads me to believe that the chromosome information is ignored in
>> findOverlaps(). Is this the case, or am I using the function incorrectly?
>> Note that it does happen that a "true hit" is returned, i.e. the
>> overlapping gene is present on the same chromosome, with start and end
>> overlapping the coordinates of the DMR.
>>
>>
>> *Attached for your use/testing:*
>>
>>    - dmrs variable
>>    - script used to annotate dmrs with information about overlapping gene
>>       - Note that I have tried to set select to arbitrary, first and
>>       last with always the same issue. I would prefer to get a single hit at this
>>       stage rather than filter afterwards, but the latter remain a possible
>>       option if necessary.
>>
>>
>> Any help / solution / feedback welcome !
>>
>> Best regards,
>> Kevin
>>
>> --
>> Kévin RUE-ALBRECHT
>> Wellcome Trust Computational Infection Biology PhD Programme
>> University College Dublin
>> Ireland
>> http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en
>>
>
>
>
> --
> Kévin RUE-ALBRECHT
> Wellcome Trust Computational Infection Biology PhD Programme
> University College Dublin
> Ireland
> http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en
>



-- 
Kévin RUE-ALBRECHT
Wellcome Trust Computational Infection Biology PhD Programme
University College Dublin
Ireland
http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en


More information about the Bioc-devel mailing list