[Bioc-sig-seq] unique reads count

Martin Morgan mtmorgan at fhcrc.org
Thu Jan 28 02:29:00 CET 2010


On 01/27/2010 02:57 PM, joseph wrote:
> If I understand this correctly, !duplicated does not count unique reads:
> !duplicated(c("A", "B", "B", "C"))
> [1]  TRUE  TRUE FALSE  TRUE
> 
> in your example, there are only two uniques (not 3 as counted by !duplicated).

Well, it depends on what your definition of 'duplicate' is, but yes, I
think we agree.

> Is it correct to say that the number of unique reads is given by the nReads when  nOccurrences=1 using the tables function? 

again unique needs definition (see unique(c("A", "B", "B", "C")), for
instance!) but yes, nOccurrences is the number of reads that occur
exactly once.

Martin

> 
> 
> ________________________________
> From: Martin Morgan <mtmorgan at fhcrc.org>
> To: joseph <jdsandjd at yahoo.com>
> Cc: bioc-sig-sequencing at r-project.org
> Sent: Wed, January 27, 2010 1:06:55 PM
> Subject: Re: [Bioc-sig-seq] unique reads count
> 
> Hi Joseph --
> 
> On 01/27/2010 12:33 PM, joseph wrote:
>> Hello
>> I have a ShortReadQ object: 
>>> rfq
>> class: ShortReadQ
>> length: 16115723 reads; width: 34 cycles
>>
>> I used the negation of the result from srduplicated to count the unique reads:
>>> sum(!srduplicated(sread(rfq)))
>> [1] 4545719
>>
>> But also I looked at the frequency with which each read occurs using the tables function:
>>> head(tables(rfq_s_3_mel)$distribution)
>>   nOccurrences  nReads
>> 1            1 4022038
>> 2            2  255649
> 
> srduplicated is behaving like 'duplicated', which is to return TRUE when
> an element has already been seen
> 
>> duplicated(c("A", "B", "B", "C"))
> [1] FALSE FALSE  TRUE FALSE
> 
> There's one duplicate, the second 'B'.
> 
> After example(srduplicated) I have
> 
>> tables(sread(rfq))$distribution
>   nOccurrences nReads
> 1            1    239
> 2            2      7
> 3            3      1
>> sum(srduplicated(sread(rfq)))
> [1] 9
> 
> there are 7 reads that are the second of two reads, and 2 reads that are
> the second and third of three reads.
> 
> Martin
> 
>>
>> I expected that for nOccurrences=1, the nReads should be the same as what I got with !srduplicated.
>>
>> Can anybody explain why I got different counts?
>> Thank you
>> Joseph Dhahbi
>>
>>
>>      
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list