[Bioc-sig-seq] coverage vectors and subseq

kirti prakash kirtiprakash3.14 at gmail.com
Wed Aug 18 16:05:50 CEST 2010


Hi Martin,

Thanks again. I really like your way of explaining... even new comers
like me are able to learn how data compression works. Its really a
nice approach to break the problem in parts in order to understand it.

I have some human histone modification data and I am trying to plot
the tag densities.

This is the complete code(from the above link) that I am trying to implement...

aln.H3K4me1 <- readAligned("dirpath", type="Bowtie")
cov.H3K4me1 = coverage(aln.H3K4me1)

deltaConvolve <- function( coverage, winCentres, revStrand, left, right)
{
    result <- rep( 0, right - left + 1 )
    for( i in seq(along=winCentres) ) {
       v <- as.vector(seqselect( coverage, winCentres[i] - left,
                                                 winCentres[i] + right ))
       if( revStrand[i] )
          v <- rev( v )
       result <- result + v
    }
    result
 }

library("biomaRt")
mart <- useMart( "ensembl", dataset = "hsapiens_gene_ensembl" )
martReply <- getBM( attributes=c( "transcript_start",
"transcript_end", "strand" ), filters="chromosome_name", values="10",
mart=mart )

tss <- ifelse( martReply$strand == 1, martReply$transcript_start,
martReply$transcript_end )

win = c(-50, 50)
H3K4me3 <- deltaConvolve( cov.H3K4me1$chr10, tss, martReply$strand ==
-1, left=win[1], right=win[2] )

The code runs for a while but after couple of minutes it gives the
above errors. so possibly some but not all values are out of bound.
As clear from above I get TSS from ensembl. There are some 6000 TSS
for chr10 and window size I have reduced to 101 so that it doesn't
gets out of bound but still I get this error.

Do I need to have some try-catch block?

Thank you,

Best regards,

Kirti Prakash


On Wed, Aug 18, 2010 at 4:01 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 08/17/2010 09:41 PM, kirti prakash wrote:
>> Hi Martin,
>>
>> Thanks a lot for your help. Bioconductor is a great place as one gets
>> the help from the best and its completely free. Its seems all one
>> needs to do good research in bioinformatics is a computer, some data
>> and Bioconductor.
>>
>> Its the same tutorial... the link you mentioned above . I cannot
>> attach the tutorials due to the limit of mail size.
>
> I meant that if the document is available in a public place, then
> provide a  URL to it so that those trying to help can better understand
> what your problem is.
>
>>
>> I did exactly as you told and most the things work fine except...
>>
>> Later in the tutorial they use deltaConvolve function...
>>
>> H3K4me1 <- deltaConvolve( cov.H3K4me1.chr10, tss, martReply$strand ==
>> -1, left=win[1], right=win[2] )
>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>  range index out of bounds
>> Error in as.vector(seqselect(coverage, winCentres[i] + left, winCentres[i] +  :
>>  error in evaluating the argument 'x' in selecting a method for
>> function 'as.vector'
>>
>> Also you mention endoapply above... it also gave me same error...
>>
>> endoapply(cov.H3K4me1, seqselect, 123456-50, 123456+50)
>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>  range index out of bounds
>
> One approach to solving problems is to try to simplify them. So let's
> try to reproduce the error but with a simpler example. You're selecting
> a subset of an Rle object
>
>>   seqselect(Rle(1:10, 10:1), 10, 20)
> 'integer' Rle of length 11 with 3 runs
>  Lengths: 1 9 1
>  Values : 1 2 3
>
> The error says something about 'index out of bounds', which suggests
>
>>   seqselect(Rle(1:10, 10:1), 10, 2000)
> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>  range index out of bounds
>>   seqselect(Rle(1:10, 10:1), -10, 20)
> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>  range index out of bounds
>
> Any hints to what your original problem is, and how to solve it?
>
> Martin
>
>>
>>
>> I found that both of these are exception handling errors but I again I
>> don't know to fix them. It would be great if you can again help on
>> this.
>
>>
>> Thanks again,
>>
>> Best Regards,
>>
>> Kirti Prakash
>>
>> On Wed, Aug 18, 2010 at 7:33 AM, kirti prakash
>> <kirtiprakash3.14 at gmail.com> wrote:
>>> Hi Martin,
>>>
>>> Thanks a lot for your help. Bioconductor is a great place as one gets
>>> the help from the best and its completely free. Its seems all one
>>> needs to do good research in bioinformatics is a computer, some data
>>> and Bioconductor.
>>>
>>> Its the same tutorial the link you mentioned above . In future I will
>>> attach the pdfs of the tutorials also.
>>>
>>> I did exactly as you told and most the things work fine except...
>>>
>>> Later in the tutorial they use deltaConvolve function...
>>>
>>> H3K4me1 <- deltaConvolve( cov.H3K4me1.chr10, tss, martReply$strand ==
>>> -1, left=win[1], right=win[2] )
>>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>>  range index out of bounds
>>> Error in as.vector(seqselect(coverage, winCentres[i] + left, winCentres[i] +  :
>>>  error in evaluating the argument 'x' in selecting a method for
>>> function 'as.vector'
>>>
>>> Also you mention endoapply above... it also gave me same error...
>>>
>>> endoapply(cov.H3K4me1, seqselect, 123456-50, 123456+50)
>>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>>  range index out of bounds
>>>
>>>
>>> I found that both of these are exception handling errors but I again I
>>> don't know to fix them. It would be great if you can again help on
>>> this.
>>>
>>> Thanks again,
>>>
>>> Best Regards,
>>>
>>> Kirti Prakash
>>>
>>>
>>>
>>>
>>> On Tue, Aug 17, 2010 at 7:47 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>>>> kirti prakash <kirtiprakash3.14 at gmail.com> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> I was reading the material "EMBL course on Short Read analysis with
>>>>> Bioconductor: An exercise with coverage vectors" by Simon Anders.
>>>>
>>>> It took me a second to find
>>>>
>>>>  http://bioconductor.org/help/course-materials/2009/EMBLJune09/Practicals/TSS/
>>>>
>>>> is that the tutorial you mean?
>>>>
>>>>>
>>>>> I tried this...
>>>>> aln <- readAligned("dirPath", pattern="sequence.map", type="Bowtie")
>>>>> cov = coverage(aln)
>>>>> cov
>>>>> [[1]]
>>>>> SimpleRleList of length 25
>>>>> $chr1
>>>>> 'integer' Rle of length 247188620 with 840106 runs
>>>>>  Lengths:    463     36   6823     36 550058 ...    713     36   2034     36
>>>>>  Values :      0      1      0      1      0 ...      0      1      0      2
>>>>>
>>>>> $chr10
>>>>> 'integer' Rle of length 135373320 with 446681 runs
>>>>>  Lengths: 88078    36  3880    36 12451 ...    20    50    36 22054    36
>>>>>  Values :     0     1     0     1     0 ...     1     0     1     0     1
>>>>> .
>>>>> .
>>>>> .
>>>>>
>>>>>> cvg <- cov$chr10
>>>>>> as.vector(subseq(cvg, 123456+50, 123456-50))
>>>>
>>>> I think this should be
>>>>
>>>>  seqselect(cvg$chr10, 123456+50, 123456-50)
>>>>
>>>> or, to subset all chromosomes,
>>>>
>>>>  endoapply(cvg, seqselect, 123456+50, 123456-50)
>>>>
>>>> (though it doesn't seem like you'd usually want to select consistent
>>>> coordinates across chromosomes).
>>>>
>>>> To get here, I looked up the help page for Rle-class
>>>>
>>>>  > class?Rle
>>>>
>>>> This is with
>>>>
>>>>> sessionInfo()
>>>> R version 2.11.1 Patched (2010-06-03 r52215)
>>>> x86_64-unknown-linux-gnu
>>>>
>>>> locale:
>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>>  attached base packages:
>>>>  [1] stats     graphics  grDevices utils     datasets  methods
>>>>  base
>>>>
>>>>  other attached packages:
>>>>  [1] ShortRead_1.6.2     Rsamtools_1.0.5     lattice_0.18-8
>>>>  [4] Biostrings_2.16.9   GenomicRanges_1.0.5 IRanges_1.6.11
>>>>
>>>>  loaded via a namespace (and not attached):
>>>>  [1] Biobase_2.8.0 grid_2.11.1   hwriter_1.2   tools_2.11.1
>>>>
>>>> Martin
>>>>
>>>>
>>>>> Error in function (classes, fdef, mtable)  :
>>>>>  unable to find an inherited method for function "subseq", for signature "list"
>>>>> Error in as.vector(subseq(cvg, 123456 + 50, 123456 - 50)) :
>>>>>  error in evaluating the argument 'x' in selecting a method for
>>>>> function 'as.vector'
>>>>>
>>>>> I guess * 'integer' Rle of length* should be 'numeric' Rle of length
>>>>> as per the booklet ... but I don't know how to fix it.
>>>>>
>>>>> I know I am making some stupid mistake. It would be great if anyone
>>>>> can provide some help on this.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Kirti Prakash
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-sig-sequencing mailing list
>>>>> Bioc-sig-sequencing at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>
>>>> --
>>>> Martin Morgan
>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N.
>>>> PO Box 19024 Seattle, WA 98109
>>>>
>>>> Location: Arnold Building M1 B861
>>>> Phone: (206) 667-2793
>>>>
>>>
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>



More information about the Bioc-sig-sequencing mailing list