[Bioc-sig-seq] Plotting Coverage for inhouse genome

Martin Morgan mtmorgan at fhcrc.org
Mon Dec 7 04:14:29 CET 2009


Hi Abhi --

Pratap, Abhishek wrote:
> Hi Patrick
> 
> Trying few more things I was atleast able to deal with selecting only
> one Chr with filtered reads.
> 
> ned_chr <- chromosome(aln) == "K-10_OM_revised.fasta"
> filtered <- alignData(aln)[["filtering"]] == "Y"
> aln_chr_1_filt <- aln[ned_chr & filtered]
> 
> levels(chromosome(aln_chr_1_filt) )
> [1] "K-10_OM_revised.fasta"
> 
> 
> But the problem with coverage functions still remains the same.
> 
> 
> cov_rLe <- coverage(x=aln_chr_1_filt, start=1, end=4832581, shift=0L,
> width=NULL,weight=1L,coords="leftmost",extend=0L);
> 
> Error: UserArgumentMismatch
>   'names(shift)' (or 'names(start)') mismatch with
> 'levels(chromosome(x))'
>   see ?"AlignedRead-class"
> In addition: Warning message:
> UserArgumentMismatch
>   use 'shift'/'width'  instead of 'start'/'end'

I wasn't sure whether this was ever resolved for you.

The first thing is to choose between EITHER shift/width OR start/end,
not both. Better to use shift/width, as it is the way this will be done
in the future.

Second, shift/width needs to be a named vector, with names matching
those in 'aln'.

This is illustrated in the example on ?"coverage,AlignedRead-method",
where we see

  cvg <- coverage(aln, shift=c(ChrA=10L))

so for your example above you might try

  coverage(aln_chr_1_filt, width=c("K-10_OM_revised.fasta"=4832581L))

The 'L' is a way of making sure 4832581 is an integer, instead of a real
number. No need to specify shift, weight, coords or extend, as these are
all the default values. The quotation marks around K-10_OM_revised.fasta
are required because "K-10_OM_revised.fasta" without quotes would not be
a valid name (the - would be parsed as a minus sign, and R would
complain about a syntax error).

Hope that helps.

Martin



> 
> 
> -Abhi
> 
> 
> 
> 
> 
> -----Original Message-----
> From: bioc-sig-sequencing-bounces at r-project.org
> [mailto:bioc-sig-sequencing-bounces at r-project.org] On Behalf Of Pratap,
> Abhishek
> Sent: Wednesday, December 02, 2009 7:53 PM
> To: Patrick Aboyoun
> Cc: bioc-sig-sequencing at r-project.org
> Subject: Re: [Bioc-sig-seq] Plotting Coverage for inhouse genome
> 
> Hi Patrick
> 
> Thanks for letting me know how to search doc efficiently. Picking up on
> the coverage method, I tried the following. Also I would like to use
> only one chr, in this case (K-10_OM_revised.fasta *). How can I change
> the aln object to just one one specific chromosome. I tired few of my
> beginner tricks (aln$ K-10_OM_revised.fasta,
> aln[[K-10_OM_revised.fasta]] but dint work. I guess I am not using right
> method on aln object.
> 
> cov_rLe <- coverage(x=aln, start=1, end=4832581, shift=0L,
> width=NULL,weight=1L,coords="leftmost",extend=0L);
> 
> Error: UserArgumentMismatch
>   'names(shift)' (or 'names(start)') mismatch with
> 'levels(chromosome(x))'
>   see ?"AlignedRead-class"
> In addition: Warning message:
> UserArgumentMismatch
>   use 'shift'/'width'  instead of 'start'/'end'
> 
> 
> Here 4832581 is the length of the genome.
> 
> 
> *
> Levels(chromosome(aln))  ======== Just to clear the confusion about the
> chromosome name. 
> 175] "7:0:1"                 "7:1:0"                 "7:7:1"
> 
> [178] "8:0:0"                 "K-10_OM_revised.fasta" "NM"
> 
> [181] "QC"                   
> 
> Cheers!
> -Abhi
> 
> 
> 
> -----Original Message-----
> From: Patrick Aboyoun [mailto:paboyoun at fhcrc.org] 
> Sent: Wednesday, December 02, 2009 6:19 PM
> To: Pratap, Abhishek
> Cc: Michael Lawrence; bioc-sig-sequencing at r-project.org
> Subject: Re: [Bioc-sig-seq] Plotting Coverage for inhouse genome
> 
> Abhi,
> To answer your first question, there is a coverage method for 
> AlignedRead objects in the ShortRead package. S4 methods can be hard to 
> find in R's help pages, particularly if the methods for a generic are 
> stored across multiple packages. When I'm unsure what is around, I use 
> the showMethods function to get a list of available methods for an S4 
> generic:
> 
>  > suppressMessages(library(ShortRead))
>  > showMethods("coverage")
> Function: coverage (package IRanges)
> x="AlignedRead"
> x="AlignedXStringSet0"
> x="IRanges"
> x="MaskCollection"
> x="MaskedXString"
> x="MIndex"
> x="numeric"
> x="PairwiseAlignedFixedSubject"
> x="PairwiseAlignedFixedSubjectSummary"
> x="RangedData"
> x="RangesList"
> x="Views"
> 
>  From there I use a help command of the form 
> help("<<generic>>,<<signature>>-method")
> 
>  > help("coverage,AlignedRead-method")
> 
> In that man page you will find information on how to use coverage for an
> 
> AlignedRead object.
> 
> Once you have the coverage vectors (in the form of an RleList), you can 
> use the approach Michael outlined for plotting.
> 
> 
> Patrick
> 
> 
> 
> Michael Lawrence wrote:
>> On Wed, Dec 2, 2009 at 2:31 PM, Pratap, Abhishek
>> <APratap at som.umaryland.edu>wrote:
>>
>>   
>>> Hi All
>>>
>>> After taking an exciting BioC course at FredHutch recently, I am
> trying
>>> to dive into BioC world to make more sense of NGS data. I would also
>>> like to thank the entire BioC team present there. You guys were
> simply
>>> great. I appreciate you all patiently answering tons of questions and
>>> taught some of the cool things we can do with BioC packages.
>>>
>>> For my current problem I would like to generate coverage plots for a
>>> small bacterial genome. I have the reads imported as a AlignedRead
>>> object and I am not sure how to proceed next. The help for the
>>> "coverage" method didn't get me very far. I am looking for some
> pointers
>>> to help me resolve the following questions.
>>>
>>> 1. How to create a RLE list from the AlignRead object using coverage
>>> method ?
>>> 2. Do I need to pack my reference genome as BSgenome data package in
>>> order to do this ? If yes I just have the sequence and no group II
> mask
>>> file.
>>> 3. Finally how to plot a RLE list.
>>>
>>>
>>>     
>> This is a pretty difficult question to answer. Do you want to see all
>> chromosomes at once (like a karyotype plot)? An overview of entire
>> chromosomes? Or do you want to explore the data in an interactive
> genome
>> browser?
>>
>> There are many packages in Bioconductor for plotting this sort of
> data,
>> including GenomeGraphs and HilbertVis. For genome browsers, there's
>> rtracklayer and others.
>>
>> For example, with rtracklayer:
>> session <- browserSession()
>> session$coverage <- as(coverage, "RangedData")
>>
>>
>>
>>   
>>> Thanks!
>>> -Abhi
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>     
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>   
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list