[BioC] countOverlaps gives an empty SummarizeExperiments
Martin Morgan
mtmorgan at fhcrc.org
Tue Apr 1 19:16:13 CEST 2014
On 04/01/2014 05:54 AM, Rafael Viana [guest] wrote:
>
> Hi expert from the list,
>
> I am a newcomer trying to perform Arabidopsis RNASeq with DESeq
> I am stuck at countOvelaps. It gives me the following warning in all the try outs i have performed so far
>
> Warning message:
> In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
> the condition has length > 1 and only the first element will be used
>
> The file is created but empty
>
>> CountOverlaps_output
> class: SummarizedExperiment
> dim: 33602 1
> exptData(0):
> assays(1): counts
> rownames(33602): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410
> rowData metadata column names(0):
> colnames(1): reads
> colData names(2): object records
>
> as I understand exptData is empty and there the counts should appear
Hi Rafael --
the counts are actually in the 'assays', e.g., assays(CountOverlaps_output)[[1]]
(short-cut when there's just a single assay -- assay(CountOverlaps_output)).
See the 'summarizeOverlaps' vignette
http://bioconductor.org/packages/release/bioc/html/GenomicRanges.html
or (for the next release, scheduled Apr 14)
http://bioconductor.org/packages/devel/bioc/html/GenomicAlignments.html
and the help page for the class ?"SummarizedExperiment-class".
'exptData' is meant for overall data pertinent to the experiment, e.g., a
publication, the lab & date where the data came from, etc.
Martin
>
>
> my .bams are paired end aligned with tophat2 usin TAIR10 as reference , They are imported to R as
>
>> Col_2 <- readGAlignmentPairsFromBam("Col_2.bam")
>> Col_2
>
> GAlignmentPairs with 13746673 alignment pairs and 0 metadata columns:
> seqnames strand : ranges -- ranges
> <Rle> <Rle> : <IRanges> -- <IRanges>
> [1] Chr1 + : [3656, 3745] -- [3667, 3755]
> [2] Chr1 + : [3659, 3748] -- [3666, 3754]
> ... ... ... ... ... ... ...
> [13746672] mitochondria + : [366403, 366492] -- [366505, 366593]
> [13746673] mitochondria - : [366648, 366737] -- [366557, 366645]
> ---
> seqlengths:
> Chr1 Chr2 Chr3 ... chloroplast mitochondria
> 30427671 19698289 23459830 ... 154478 366924
>
> The reference genome TAIR10 comes form biomart
> Previously, I also tried to made my own one using Rtracklayer import, as suggested by Valerie in the Counting with summarizeOverlaps vignnete, but I found biomart more accurate
>
>> library(TxDb.Athaliana.BioMart.plantsmart19)
>> TAIR10_GRL<- exonsBy (TxDb.Athaliana.BioMart.plantsmart19, "gene")
> # Added correct Chr sizes and the same names than the samples
>> seqlengths(TAIR10_GRL) <- c(30427671,19698289,23459830,18585056,26975502,154478,366924) #chr (1, 2, 3, 4, 5, C, M)
>> seqlevels(TAIR10_GRL) <-c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "chloroplast", "mitochondria")
>
>> TAIR10_GRL
> GRangesList of length 33602:
> $AT1G01010
> GRanges with 6 ranges and 2 metadata columns:
> seqnames ranges strand | exon_id exon_name
> <Rle> <IRanges> <Rle> | <integer> <character>
> [1] Chr1 [3631, 3913] + | 1 AT1G01010-E.1
> [2] Chr1 [3996, 4276] + | 2 AT1G01010-E.2
> [3] Chr1 [4486, 4605] + | 3 AT1G01010-E.3
> [4] Chr1 [4706, 5095] + | 4 AT1G01010-E.4
> [5] Chr1 [5174, 5326] + | 5 AT1G01010-E.5
> [6] Chr1 [5439, 5899] + | 6 AT1G01010-E.6
> ...
> <33601 more elements>
> ---
> seqlengths:
> Chr1 Chr2 Chr3 Chr4 Chr5 chloroplast mitochondria
> 30427671 19698289 23459830 18585056 26975502 154478 366924
>
>
> I have tried the following commands to get the summarizedExperiment object:
>
>
>> col2_cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", singleEnd=F)
>
> as a previous Arabidopsis user asked on the forum, and realized, I also tried to make ignore.strand=T without getting a different output
>> col_2cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", ignore.strand=T, singleEnd=F)
>
>
> I have also tried, with the same empty list output
>> fls <- c("~/bams/accepted_hits_Col_2.bam", "~/bams/accepted_hits_Col_3.bam",
> "~/bams/accepted_hits_M_1.bam", "~/bams/accepted_hits_M_2.bam", "~/bams/accepted_hits_M_3.bam")
>> BFL <- BamFileList(fls, index = character())
>> SumExp<-summarizeOverlaps(TAIR10_GRL, BFL, mode="Union", singleEnd=F)
>
> When I used Htseq (As suggested by Simon Anders in the DESeq Vignette) I get thelist, but I would like to implement the whole pipeline with bioconductor
> it seems to me that i am making a very basic mistake, any idea or suggestion will be greatly appreciated
>
>
> Kind regards
>
>
>
>
> -- output of sessionInfo():
>
> R version 3.0.3 (2014-03-06)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] easyRNASeq_1.8.7 ShortRead_1.20.0 edgeR_3.4.2
> [4] limma_3.18.9 biomaRt_2.18.0 genomeIntervals_1.18.0
> [7] intervals_0.14.0 multicore_0.1-7 DESeq_1.14.0
> [10] lattice_0.20-27 locfit_1.5-9.1 Biobase_2.22.0
> [13] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4
> [16] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.40.1 AnnotationDbi_1.24.0 bitops_1.0-6
> [4] DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0
> [7] grid_3.0.3 hwriter_1.3 latticeExtra_0.6-26
> [10] LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1
> [13] RSQLite_0.11.4 splines_3.0.3 stats4_3.0.3
> [16] survival_2.37-7 XML_3.98-1.1 xtable_1.7-3
> [19] zlibbioc_1.8.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list