[BioC] countOverlaps gives an empty SummarizeExperiments

Rafael Viana [guest] guest at bioconductor.org
Tue Apr 1 14:54:38 CEST 2014


Hi expert from the list,

I am a newcomer trying to perform Arabidopsis RNASeq  with DESeq 
I am stuck at countOvelaps. It gives me the following warning in all the try outs i have performed so far

Warning message:
In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
the condition has length > 1 and only the first element will be used

The file is created but empty

>CountOverlaps_output
class: SummarizedExperiment 
dim: 33602 1 
exptData(0):
assays(1): counts
rownames(33602): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410
rowData metadata column names(0):
colnames(1): reads
colData names(2): object records

as I understand exptData is empty and there the counts should appear 


my .bams are  paired end aligned with tophat2 usin TAIR10 as reference , They are imported to R as 

>Col_2 <- readGAlignmentPairsFromBam("Col_2.bam") 
>Col_2

GAlignmentPairs with 13746673 alignment pairs and 0 metadata columns:
               seqnames strand   :           ranges  --           ranges
                  <Rle>  <Rle>   :        <IRanges>  --        <IRanges>
       [1]         Chr1      +   :     [3656, 3745]  --     [3667, 3755]
       [2]         Chr1      +   :     [3659, 3748]  --     [3666, 3754]
       ...          ...    ... ...              ... ...              ...
[13746672] mitochondria      +   : [366403, 366492]  -- [366505, 366593]
[13746673] mitochondria      -   : [366648, 366737]  -- [366557, 366645]
---
seqlengths:
         Chr1         Chr2         Chr3 ...  chloroplast mitochondria
     30427671     19698289     23459830 ...       154478       366924

The reference genome  TAIR10 comes form biomart 
Previously, I also tried to made my own one using Rtracklayer import, as suggested by Valerie in the Counting with summarizeOverlaps vignnete, but I found biomart more accurate

>library(TxDb.Athaliana.BioMart.plantsmart19)
>TAIR10_GRL<- exonsBy (TxDb.Athaliana.BioMart.plantsmart19, "gene") 
# Added correct Chr sizes and the same names than the samples
>seqlengths(TAIR10_GRL) <- c(30427671,19698289,23459830,18585056,26975502,154478,366924)  #chr (1, 2, 3, 4, 5, C, M)
>seqlevels(TAIR10_GRL) <-c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "chloroplast", "mitochondria")

> TAIR10_GRL
GRangesList of length 33602:
$AT1G01010 
GRanges with 6 ranges and 2 metadata columns:
      seqnames       ranges strand |   exon_id     exon_name
         <Rle>    <IRanges>  <Rle> | <integer>   <character>
  [1]     Chr1 [3631, 3913]      + |         1 AT1G01010-E.1
  [2]     Chr1 [3996, 4276]      + |         2 AT1G01010-E.2
  [3]     Chr1 [4486, 4605]      + |         3 AT1G01010-E.3
  [4]     Chr1 [4706, 5095]      + |         4 AT1G01010-E.4
  [5]     Chr1 [5174, 5326]      + |         5 AT1G01010-E.5
  [6]     Chr1 [5439, 5899]      + |         6 AT1G01010-E.6
...
<33601 more elements>
---
seqlengths:
         Chr1         Chr2         Chr3         Chr4         Chr5  chloroplast mitochondria
     30427671     19698289     23459830     18585056     26975502       154478       366924


I have tried the following commands to get the summarizedExperiment object:


>col2_cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union",  singleEnd=F)

as a previous Arabidopsis user asked on the forum, and realized,  I also tried to make ignore.strand=T without getting a different output
>col_2cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", ignore.strand=T,  singleEnd=F)


I have also tried, with the same empty list output
>fls <- c("~/bams/accepted_hits_Col_2.bam", "~/bams/accepted_hits_Col_3.bam",
         "~/bams/accepted_hits_M_1.bam", "~/bams/accepted_hits_M_2.bam", "~/bams/accepted_hits_M_3.bam")
>BFL <- BamFileList(fls, index = character())
>SumExp<-summarizeOverlaps(TAIR10_GRL, BFL, mode="Union", singleEnd=F)

When I used Htseq (As suggested by Simon Anders in the DESeq Vignette) I get thelist, but I would like to implement the whole pipeline with bioconductor
it seems to me that i am making a very basic mistake, any idea or suggestion will be greatly appreciated


Kind regards




 -- output of sessionInfo(): 

R version 3.0.3 (2014-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] easyRNASeq_1.8.7       ShortRead_1.20.0       edgeR_3.4.2           
 [4] limma_3.18.9           biomaRt_2.18.0         genomeIntervals_1.18.0
 [7] intervals_0.14.0       multicore_0.1-7        DESeq_1.14.0          
[10] lattice_0.20-27        locfit_1.5-9.1         Biobase_2.22.0        
[13] Rsamtools_1.14.2       Biostrings_2.30.1      GenomicRanges_1.14.4  
[16] XVector_0.2.0          IRanges_1.20.6         BiocGenerics_0.8.0    

loaded via a namespace (and not attached):
 [1] annotate_1.40.1      AnnotationDbi_1.24.0 bitops_1.0-6        
 [4] DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0  
 [7] grid_3.0.3           hwriter_1.3          latticeExtra_0.6-26 
[10] LSD_2.5              RColorBrewer_1.0-5   RCurl_1.95-4.1      
[13] RSQLite_0.11.4       splines_3.0.3        stats4_3.0.3        
[16] survival_2.37-7      XML_3.98-1.1         xtable_1.7-3        
[19] zlibbioc_1.8.0      


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list