[BioC] Rsubread : featureCounts PEReadsReordering=TRUE deletes unpaired reads?

Mon Nov 11 01:59:32 CET 2013

Dear Ryan,

featureCounts requires that both reads from the same pair are present in the BAM/SAM file. But this doesn't seem to be the case for your data. It looks like unmapped reads were not included in your BAM files and therefore you got quite a few unpaired reads in your data. When counting in paired-end mode (isPairedEnd=TRUE) and read reordering being required (PEReadsReordering=TREU), featureCounts tries to reorder reads from the same pair to make them be adjacent to each other before performing read counting. If unpaired reads are encountered, featureCounts simply discards them. 

Simply sorting your bam files by name is not going to solve this problem because many reads are unpaired in your data. You may use an aligner that outputs all the reads in it mapping output (such as the align function in Rsubread) and then runs featureCounts on it.

In the meantime, we are now trying to improve featureCounts to deal with this simulation. Will let you know when this is done.

Also note that this problem only occurs for paired-end data, but not for single-end read data.

Best regards,
Wei

On Nov 9, 2013, at 10:54 AM, Ryan C. Thompson wrote:

> Hello,
> 
> I am using the featureCounts function in Rsubread to count RNA-seq reads. Since my BAM files are coordinate-sorted paired-end, I am using "isPairedEnd=TRUE" and "PEReadsReordering=TRUE". When I run my script, I see the following in the output:
> 
> Process /gpfs/home/rcthomps/Projects/cyno/tophat_results/R77_CN7314_P4 ...
>   Resort the input file ...
>   2718714 unpaired reads were ignored in reordering.
>   Total number of fragments is : 1653582
>   Number of successfully assigned fragments is : 1529491 (92.5%)
>   Running time : 3.39 minutes
> 
> In particular, I am worried about the line "2718714 unpaired reads were ignored in reordering." The documentation for PEReadsReordering doesn't say anything about discarding unpaired reads, so I'm wondering whether this is intended behavior, and whether there is a way around this? If I sort my bam files by name so that pairs are adjacent in the file, will subread still discard the unpaired ones, or will it count all of them?
> 
> -Ryan Thompson
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}