[Bioc-sig-seq] read length distribution from ShortReadQ or AlignedRead

Martin Morgan mtmorgan at fhcrc.org
Sun Aug 22 23:34:52 CEST 2010


On 08/22/2010 02:01 PM, joseph wrote:
> Thanks Martin. Very useful.
> can you plot multiple lanes with different colors in one graph?

Hi Joseph --

densityplot (and many other parts of lattice) wants a data.frame where
the data being plotted are in some columns, and indicator variables
about the groups to which the data belong are in another. It has the
convenience function make.groups for coercing individual vectors into
the desired format. So

## make.groups returns a data frame with columns 'data' and 'which'
df = make.groups(w1=width(reads1), w2=width(reads2))
densityplot(~data, group=which, data=df, plot.points=FALSE)

sometimes it'll be convenient to df = do.call(make.groups, lst), where
lst is a list of 'width' vectors, e.g., from the result of lapply.

The lattice equivelent of hist() is barchart, as in
barchart(table(width(reads))). I think you would have success with

  df1 = as.data.frame(table(width(reads1)))
  df2 = as.data.frame(table(width(reads2)))
  df = merge(df1, df2, by="Var1", all=TRUE)
  barchart(Freq.x + Freq.y ~ Var1, df)

or more elaborate ways of constructing 'df' so that one column is the
'width' and other columns are the corresponding counts in each read.

Martin


> 
> 
> 
> 
> ________________________________
> From: Martin Morgan <mtmorgan at fhcrc.org>
> To: joseph <jdsandjd at yahoo.com>
> Cc: Steve Lianoglou <mailinglist.honeypot at gmail.com>; 
> bioc-sig-sequencing at r-project.org
> Sent: Sun, August 22, 2010 12:55:08 PM
> Subject: Re: [Bioc-sig-seq] read length distribution from ShortReadQ or 
> AlignedRead
> 
> On 08/22/2010 11:32 AM, joseph wrote:
>> Perfect. Thanks.
> 
> a little tangential, but plot(table(width(reads))) avoids the arbitrary
> histogram bins when the number of unique values of width(reads) is
> small, and library(lattice); densityplot(width(reads),
> plot.points=FALSE) avoids the arbitrary choice of bin widths when the
> number of unique values of width(reads) is large.
> 
> Martin
> 
>>
>>
>>
>>
>> ________________________________
>> From: Steve Lianoglou <mailinglist.honeypot at gmail.com>
>>
>> Cc: bioc-sig-sequencing at r-project.org
>> Sent: Sun, August 22, 2010 10:59:37 AM
>> Subject: Re: [Bioc-sig-seq] read length distribution from ShortReadQ or 
>> AlignedRead
>>
>> Sorry .. didn't CC to bioc (like I told you to do :-) ...
>>
>> On Sun, Aug 22, 2010 at 1:58 PM, Steve Lianoglou
>> <mailinglist.honeypot at gmail.com> wrote:
>>> Hi Joseph,
>>>
>>> Don't forget to hit "Reply All" when responding to BioC emails, so
>>> that the help stays on the list ..
>>>
>>
>>>> Thanks Steve.
>>>> how do I get a table of the actual values?
>>>
>>> The `width` function returns a vector of the same length as your
>>> ShortRead object (which is the number of reads you have there), so
>>> assuming your ShortRead object is still called `reads`:
>>>
>>> R> read.length <- width(reads)
>>>
>>> That `read.length` vector has the length of all the reads. You can
>>> then manipulate it as you would any other vector in R. It sounds like
>>> you might want the `table` or `tabulate` function, or something
>>> similar.
>>>
>>> Also note that the `hist` function returns an "invisible" object you
>>> can work with. If you do something like this:
>>>
>>> R> h <- hist(width(reads))
>>>
>>> `h` might look something like this:
>>>
>>> R> h
>>> $breaks
>>> [1] 30 40
>>>
>>> $counts
>>> [1] 256
>>>
>>> $intensities
>>> [1] 0.1
>>>
>>> $density
>>> [1] 0.1
>>>
>>> $mids
>>> [1] 35
>>>
>>> $xname
>>> [1] "width(rfq)"
>>>
>>> $equidist
>>> [1] TRUE
>>>
>>> attr(,"class")
>>> [1] "histogram"
>>>
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>  | Memorial Sloan-Kettering Cancer Center
>>>  | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>
>>
>>
> 
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list