[Bioc-sig-seq] coverage vectors and subseq

kirti prakash kirtiprakash3.14 at gmail.com
Wed Aug 18 17:15:02 CEST 2010


Hi Martin,

As per your explanations I should not get out of bound error. TSS are
from ensembl from chromosome 10 and histone modification data is also
from human and I am searching only +/- 50 even +/-10000 should not
give me out of bound error.

I don't know whether there is some concept of range for data types in
R like we have for data types in C.

I was also wondering if converting the  'integer' Rle of length* to
'numeric' Rle of length would be useful because may be integer vector
gets out of range. But again I don't know how to do the conversion
from integer to numeric.

Thank you,

Best regards,

Kirti Prakash

On Wed, Aug 18, 2010 at 5:05 PM, kirti prakash
<kirtiprakash3.14 at gmail.com> wrote:
> Hi Martin,
>
> Thanks again. I really like your way of explaining... even new comers
> like me are able to learn how data compression works. Its really a
> nice approach to break the problem in parts in order to understand it.
>
> I have some human histone modification data and I am trying to plot
> the tag densities.
>
> This is the complete code(from the above link) that I am trying to implement...
>
> aln.H3K4me1 <- readAligned("dirpath", type="Bowtie")
> cov.H3K4me1 = coverage(aln.H3K4me1)
>
> deltaConvolve <- function( coverage, winCentres, revStrand, left, right)
> {
>    result <- rep( 0, right - left + 1 )
>    for( i in seq(along=winCentres) ) {
>       v <- as.vector(seqselect( coverage, winCentres[i] - left,
>                                                 winCentres[i] + right ))
>       if( revStrand[i] )
>          v <- rev( v )
>       result <- result + v
>    }
>    result
>  }
>
> library("biomaRt")
> mart <- useMart( "ensembl", dataset = "hsapiens_gene_ensembl" )
> martReply <- getBM( attributes=c( "transcript_start",
> "transcript_end", "strand" ), filters="chromosome_name", values="10",
> mart=mart )
>
> tss <- ifelse( martReply$strand == 1, martReply$transcript_start,
> martReply$transcript_end )
>
> win = c(-50, 50)
> H3K4me3 <- deltaConvolve( cov.H3K4me1$chr10, tss, martReply$strand ==
> -1, left=win[1], right=win[2] )
>
> The code runs for a while but after couple of minutes it gives the
> above errors. so possibly some but not all values are out of bound.
> As clear from above I get TSS from ensembl. There are some 6000 TSS
> for chr10 and window size I have reduced to 101 so that it doesn't
> gets out of bound but still I get this error.
>
> Do I need to have some try-catch block?
>
> Thank you,
>
> Best regards,
>
> Kirti Prakash
>
>
> On Wed, Aug 18, 2010 at 4:01 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> On 08/17/2010 09:41 PM, kirti prakash wrote:
>>> Hi Martin,
>>>
>>> Thanks a lot for your help. Bioconductor is a great place as one gets
>>> the help from the best and its completely free. Its seems all one
>>> needs to do good research in bioinformatics is a computer, some data
>>> and Bioconductor.
>>>
>>> Its the same tutorial... the link you mentioned above . I cannot
>>> attach the tutorials due to the limit of mail size.
>>
>> I meant that if the document is available in a public place, then
>> provide a  URL to it so that those trying to help can better understand
>> what your problem is.
>>
>>>
>>> I did exactly as you told and most the things work fine except...
>>>
>>> Later in the tutorial they use deltaConvolve function...
>>>
>>> H3K4me1 <- deltaConvolve( cov.H3K4me1.chr10, tss, martReply$strand ==
>>> -1, left=win[1], right=win[2] )
>>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>>  range index out of bounds
>>> Error in as.vector(seqselect(coverage, winCentres[i] + left, winCentres[i] +  :
>>>  error in evaluating the argument 'x' in selecting a method for
>>> function 'as.vector'
>>>
>>> Also you mention endoapply above... it also gave me same error...
>>>
>>> endoapply(cov.H3K4me1, seqselect, 123456-50, 123456+50)
>>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>>  range index out of bounds
>>
>> One approach to solving problems is to try to simplify them. So let's
>> try to reproduce the error but with a simpler example. You're selecting
>> a subset of an Rle object
>>
>>>   seqselect(Rle(1:10, 10:1), 10, 20)
>> 'integer' Rle of length 11 with 3 runs
>>  Lengths: 1 9 1
>>  Values : 1 2 3
>>
>> The error says something about 'index out of bounds', which suggests
>>
>>>   seqselect(Rle(1:10, 10:1), 10, 2000)
>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>  range index out of bounds
>>>   seqselect(Rle(1:10, 10:1), -10, 20)
>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>  range index out of bounds
>>
>> Any hints to what your original problem is, and how to solve it?
>>
>> Martin
>>
>>>
>>>
>>> I found that both of these are exception handling errors but I again I
>>> don't know to fix them. It would be great if you can again help on
>>> this.
>>
>>>
>>> Thanks again,
>>>
>>> Best Regards,
>>>
>>> Kirti Prakash
>>>
>>> On Wed, Aug 18, 2010 at 7:33 AM, kirti prakash
>>> <kirtiprakash3.14 at gmail.com> wrote:
>>>> Hi Martin,
>>>>
>>>> Thanks a lot for your help. Bioconductor is a great place as one gets
>>>> the help from the best and its completely free. Its seems all one
>>>> needs to do good research in bioinformatics is a computer, some data
>>>> and Bioconductor.
>>>>
>>>> Its the same tutorial the link you mentioned above . In future I will
>>>> attach the pdfs of the tutorials also.
>>>>
>>>> I did exactly as you told and most the things work fine except...
>>>>
>>>> Later in the tutorial they use deltaConvolve function...
>>>>
>>>> H3K4me1 <- deltaConvolve( cov.H3K4me1.chr10, tss, martReply$strand ==
>>>> -1, left=win[1], right=win[2] )
>>>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>>>  range index out of bounds
>>>> Error in as.vector(seqselect(coverage, winCentres[i] + left, winCentres[i] +  :
>>>>  error in evaluating the argument 'x' in selecting a method for
>>>> function 'as.vector'
>>>>
>>>> Also you mention endoapply above... it also gave me same error...
>>>>
>>>> endoapply(cov.H3K4me1, seqselect, 123456-50, 123456+50)
>>>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>>>  range index out of bounds
>>>>
>>>>
>>>> I found that both of these are exception handling errors but I again I
>>>> don't know to fix them. It would be great if you can again help on
>>>> this.
>>>>
>>>> Thanks again,
>>>>
>>>> Best Regards,
>>>>
>>>> Kirti Prakash
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 17, 2010 at 7:47 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>>>>> kirti prakash <kirtiprakash3.14 at gmail.com> writes:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was reading the material "EMBL course on Short Read analysis with
>>>>>> Bioconductor: An exercise with coverage vectors" by Simon Anders.
>>>>>
>>>>> It took me a second to find
>>>>>
>>>>>  http://bioconductor.org/help/course-materials/2009/EMBLJune09/Practicals/TSS/
>>>>>
>>>>> is that the tutorial you mean?
>>>>>
>>>>>>
>>>>>> I tried this...
>>>>>> aln <- readAligned("dirPath", pattern="sequence.map", type="Bowtie")
>>>>>> cov = coverage(aln)
>>>>>> cov
>>>>>> [[1]]
>>>>>> SimpleRleList of length 25
>>>>>> $chr1
>>>>>> 'integer' Rle of length 247188620 with 840106 runs
>>>>>>  Lengths:    463     36   6823     36 550058 ...    713     36   2034     36
>>>>>>  Values :      0      1      0      1      0 ...      0      1      0      2
>>>>>>
>>>>>> $chr10
>>>>>> 'integer' Rle of length 135373320 with 446681 runs
>>>>>>  Lengths: 88078    36  3880    36 12451 ...    20    50    36 22054    36
>>>>>>  Values :     0     1     0     1     0 ...     1     0     1     0     1
>>>>>> .
>>>>>> .
>>>>>> .
>>>>>>
>>>>>>> cvg <- cov$chr10
>>>>>>> as.vector(subseq(cvg, 123456+50, 123456-50))
>>>>>
>>>>> I think this should be
>>>>>
>>>>>  seqselect(cvg$chr10, 123456+50, 123456-50)
>>>>>
>>>>> or, to subset all chromosomes,
>>>>>
>>>>>  endoapply(cvg, seqselect, 123456+50, 123456-50)
>>>>>
>>>>> (though it doesn't seem like you'd usually want to select consistent
>>>>> coordinates across chromosomes).
>>>>>
>>>>> To get here, I looked up the help page for Rle-class
>>>>>
>>>>>  > class?Rle
>>>>>
>>>>> This is with
>>>>>
>>>>>> sessionInfo()
>>>>> R version 2.11.1 Patched (2010-06-03 r52215)
>>>>> x86_64-unknown-linux-gnu
>>>>>
>>>>> locale:
>>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>>  attached base packages:
>>>>>  [1] stats     graphics  grDevices utils     datasets  methods
>>>>>  base
>>>>>
>>>>>  other attached packages:
>>>>>  [1] ShortRead_1.6.2     Rsamtools_1.0.5     lattice_0.18-8
>>>>>  [4] Biostrings_2.16.9   GenomicRanges_1.0.5 IRanges_1.6.11
>>>>>
>>>>>  loaded via a namespace (and not attached):
>>>>>  [1] Biobase_2.8.0 grid_2.11.1   hwriter_1.2   tools_2.11.1
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>> Error in function (classes, fdef, mtable)  :
>>>>>>  unable to find an inherited method for function "subseq", for signature "list"
>>>>>> Error in as.vector(subseq(cvg, 123456 + 50, 123456 - 50)) :
>>>>>>  error in evaluating the argument 'x' in selecting a method for
>>>>>> function 'as.vector'
>>>>>>
>>>>>> I guess * 'integer' Rle of length* should be 'numeric' Rle of length
>>>>>> as per the booklet ... but I don't know how to fix it.
>>>>>>
>>>>>> I know I am making some stupid mistake. It would be great if anyone
>>>>>> can provide some help on this.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Kirti Prakash
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-sig-sequencing mailing list
>>>>>> Bioc-sig-sequencing at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>
>>>>> --
>>>>> Martin Morgan
>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N.
>>>>> PO Box 19024 Seattle, WA 98109
>>>>>
>>>>> Location: Arnold Building M1 B861
>>>>> Phone: (206) 667-2793
>>>>>
>>>>
>>
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>



More information about the Bioc-sig-sequencing mailing list