[BioC] countOverlaps gives an empty SummarizeExperiments

Tue Apr 1 19:16:13 CEST 2014

On 04/01/2014 05:54 AM, Rafael Viana [guest] wrote:
>
> Hi expert from the list,
>
> I am a newcomer trying to perform Arabidopsis RNASeq  with DESeq
> I am stuck at countOvelaps. It gives me the following warning in all the try outs i have performed so far
>
> Warning message:
> In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
> the condition has length > 1 and only the first element will be used
>
> The file is created but empty
>
>> CountOverlaps_output
> class: SummarizedExperiment
> dim: 33602 1
> exptData(0):
> assays(1): counts
> rownames(33602): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410
> rowData metadata column names(0):
> colnames(1): reads
> colData names(2): object records
>
> as I understand exptData is empty and there the counts should appear

Hi Rafael --

the counts are actually in the 'assays', e.g., assays(CountOverlaps_output)[[1]] 
(short-cut when there's just a single assay -- assay(CountOverlaps_output)).

See the 'summarizeOverlaps' vignette

http://bioconductor.org/packages/release/bioc/html/GenomicRanges.html

or (for the next release, scheduled Apr 14)

http://bioconductor.org/packages/devel/bioc/html/GenomicAlignments.html

and the help page for the class ?"SummarizedExperiment-class".

'exptData' is meant for overall data pertinent to the experiment, e.g., a 
publication, the lab & date where the data came from, etc.

Martin

>
>
> my .bams are  paired end aligned with tophat2 usin TAIR10 as reference , They are imported to R as
>
>> Col_2 <- readGAlignmentPairsFromBam("Col_2.bam")
>> Col_2
>
> GAlignmentPairs with 13746673 alignment pairs and 0 metadata columns:
>                 seqnames strand   :           ranges  --           ranges
>                    <Rle>  <Rle>   :        <IRanges>  --        <IRanges>
>         [1]         Chr1      +   :     [3656, 3745]  --     [3667, 3755]
>         [2]         Chr1      +   :     [3659, 3748]  --     [3666, 3754]
>         ...          ...    ... ...              ... ...              ...
> [13746672] mitochondria      +   : [366403, 366492]  -- [366505, 366593]
> [13746673] mitochondria      -   : [366648, 366737]  -- [366557, 366645]
> ---
> seqlengths:
>           Chr1         Chr2         Chr3 ...  chloroplast mitochondria
>       30427671     19698289     23459830 ...       154478       366924
>
> The reference genome  TAIR10 comes form biomart
> Previously, I also tried to made my own one using Rtracklayer import, as suggested by Valerie in the Counting with summarizeOverlaps vignnete, but I found biomart more accurate
>
>> library(TxDb.Athaliana.BioMart.plantsmart19)
>> TAIR10_GRL<- exonsBy (TxDb.Athaliana.BioMart.plantsmart19, "gene")
> # Added correct Chr sizes and the same names than the samples
>> seqlengths(TAIR10_GRL) <- c(30427671,19698289,23459830,18585056,26975502,154478,366924)  #chr (1, 2, 3, 4, 5, C, M)
>> seqlevels(TAIR10_GRL) <-c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "chloroplast", "mitochondria")
>
>> TAIR10_GRL
> GRangesList of length 33602:
> $AT1G01010
> GRanges with 6 ranges and 2 metadata columns:
>        seqnames       ranges strand |   exon_id     exon_name
>           <Rle>    <IRanges>  <Rle> | <integer>   <character>
>    [1]     Chr1 [3631, 3913]      + |         1 AT1G01010-E.1
>    [2]     Chr1 [3996, 4276]      + |         2 AT1G01010-E.2
>    [3]     Chr1 [4486, 4605]      + |         3 AT1G01010-E.3
>    [4]     Chr1 [4706, 5095]      + |         4 AT1G01010-E.4
>    [5]     Chr1 [5174, 5326]      + |         5 AT1G01010-E.5
>    [6]     Chr1 [5439, 5899]      + |         6 AT1G01010-E.6
> ...
> <33601 more elements>
> ---
> seqlengths:
>           Chr1         Chr2         Chr3         Chr4         Chr5  chloroplast mitochondria
>       30427671     19698289     23459830     18585056     26975502       154478       366924
>
>
> I have tried the following commands to get the summarizedExperiment object:
>
>
>> col2_cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union",  singleEnd=F)
>
> as a previous Arabidopsis user asked on the forum, and realized,  I also tried to make ignore.strand=T without getting a different output
>> col_2cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", ignore.strand=T,  singleEnd=F)
>
>
> I have also tried, with the same empty list output
>> fls <- c("~/bams/accepted_hits_Col_2.bam", "~/bams/accepted_hits_Col_3.bam",
>           "~/bams/accepted_hits_M_1.bam", "~/bams/accepted_hits_M_2.bam", "~/bams/accepted_hits_M_3.bam")
>> BFL <- BamFileList(fls, index = character())
>> SumExp<-summarizeOverlaps(TAIR10_GRL, BFL, mode="Union", singleEnd=F)
>
> When I used Htseq (As suggested by Simon Anders in the DESeq Vignette) I get thelist, but I would like to implement the whole pipeline with bioconductor
> it seems to me that i am making a very basic mistake, any idea or suggestion will be greatly appreciated
>
>
> Kind regards
>
>
>
>
>   -- output of sessionInfo():
>
> R version 3.0.3 (2014-03-06)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>   [1] easyRNASeq_1.8.7       ShortRead_1.20.0       edgeR_3.4.2
>   [4] limma_3.18.9           biomaRt_2.18.0         genomeIntervals_1.18.0
>   [7] intervals_0.14.0       multicore_0.1-7        DESeq_1.14.0
> [10] lattice_0.20-27        locfit_1.5-9.1         Biobase_2.22.0
> [13] Rsamtools_1.14.2       Biostrings_2.30.1      GenomicRanges_1.14.4
> [16] XVector_0.2.0          IRanges_1.20.6         BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
>   [1] annotate_1.40.1      AnnotationDbi_1.24.0 bitops_1.0-6
>   [4] DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0
>   [7] grid_3.0.3           hwriter_1.3          latticeExtra_0.6-26
> [10] LSD_2.5              RColorBrewer_1.0-5   RCurl_1.95-4.1
> [13] RSQLite_0.11.4       splines_3.0.3        stats4_3.0.3
> [16] survival_2.37-7      XML_3.98-1.1         xtable_1.7-3
> [19] zlibbioc_1.8.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793