[BioC] eayRNASeq with Ensemble GRCh37 help
Aki Hoji
akh22 at pitt.edu
Mon Sep 16 20:17:27 CEST 2013
Hi,
I've been trying to generate an output file for DESeq2 by easyRNASeq. An input file is a BAM generated by Tophat2/Bowtie2 with Ensemble GRCh37.72 which was a part of Illumina's iGenome package. I followed the overview and samples of easyRNASeq in a BioC mailing list and fired up a following;
testcount<-easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes="auto", readLength=100L, annotationMethod="gtf", annotationFile="Ensemble.gtf", count="exons", outputFormat="DESeq", filenames="4673Bsorted.bam")
Then I got this error;
Checking arguments...
Fetching annotations...
Read 2280612 records
Error in easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", :
The number of conditions: 0 did not correspond to the number of samples: 1
In addition: Warning messages:
1: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", :
You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it.
2: In .Method(..., deparse.level = deparse.level) :
number of columns of result is not a multiple of vector length (arg 1)
3: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", :
There are 966272 features/exons defined in your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want?
4: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", :
You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it.
As far as I can tell, I am not really enforcing the UCSC chromosome convention, and chr.sizes could be set to auto since the BAM file is used. I am getting stuck at this point and any help/pointer will be really appreciated.
Thanks.
AH
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] easyRNASeq_1.6.0 ShortRead_1.18.0 latticeExtra_0.6-26
[4] RColorBrewer_1.0-5 Rsamtools_1.12.4 DESeq_1.12.1
[7] lattice_0.20-23 locfit_1.5-9.1 BSgenome_1.28.0
[10] GenomicRanges_1.12.5 Biostrings_2.28.0 IRanges_1.18.3
[13] edgeR_3.2.4 limma_3.16.7 biomaRt_2.16.0
[16] Biobase_2.20.1 genomeIntervals_1.16.0 BiocGenerics_0.6.0
[19] intervals_0.14.0 BiocInstaller_1.10.3
loaded via a namespace (and not attached):
[1] annotate_1.38.0 AnnotationDbi_1.22.6 bitops_1.0-6
[4] DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0
[7] grid_3.0.1 hwriter_1.3 RCurl_1.95-4.1
[10] RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1
[13] survival_2.37-4 tools_3.0.1 XML_3.95-0.2
[16] xtable_1.7-1 zlibbioc_1.6.0
More information about the Bioconductor
mailing list