[BioC] EasyRNAseq - error in building count table - Error in unlist

Thu Sep 12 17:38:55 CEST 2013

Hej Felix!

This is strange. Would you be able to share the data offline with me so that I can try to reproduce the issue? If it's possible I would create a drop box folder.

Cheers,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------

On Sep 12, 2013, at 4:36 PM, Felix Frey [guest] wrote:

> 
> Hello,
> I have sequencing data of maize which I aligned to the reference sequence (The maize sequence ZmB73_RefGen_v2.tar.gz, downloaded from:
> # http://ftp.maizesequence.org/current/assembly/) using tophat (v2.0.3), bowtie2 (2.0.0.6) and samtools (0.1.18.0).
> I want to make a count table with easyRNAseq (1.4.2) to use for EdgeR.
> Some months ago, I did following analysis with easyRNASeq:
> 
> chr.map <- data.frame(  from=c("chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10"),
>  to=c("1","2","3","4","5","6","7","8","9","10"))
> ensembl <- useMart("ENSEMBL_MART_PLANT",dataset="zmays_eg_gene")
> exon.annotation <- getBM(c("ensembl_gene_id",
>                            "ensembl_transcript_id",
>                            "chromosome_name",
>                            "ensembl_gene_id",
>                            "ensembl_exon_id",
>                            "exon_chrom_start",
>                            "exon_chrom_end"),
>                            mart=ensembl,
>                            filters="chromosome_name",
>                            values=chrm)
> exon.annotation$chromosome <- paste("chr",exon.annotation$chromosome_name,sep="")
> exon.range <- RangedData(IRanges(
>    start=exon.annotation$exon_chrom_start,
>    end=exon.annotation$exon_chrom_end),
>     space=exon.annotation$chromosome,
>     strand=exon.annotation$strand,
>     transcript=exon.annotation$ensembl_transcript_id,
>     gene=exon.annotation$ensembl_gene_id,
>     exon=exon.annotation$ensembl_exon_id,
>     universe = "NULL")
> 
> files <- c(	"accepted_hits_12a_sorted.bam",
> 		"accepted_hits_13a_sorted.bam",...)
> 
> rnaSeq <- easyRNASeq(filesDirectory="/EASYRNASEQ/countdata",
>                          gapped=F,
>                          validity.check=TRUE,
>                          chr.map=chr.map,
>                          organism="custom",
>                          annotationMethod="env",
>                          annotationObject=exon.range,
>                          count="genes",
> 			  filenames=files,
> 			  summarization="geneModels",
> 			  outputFormat="RNAseq")
> 
> I got an output gene count table.
> Now, I wanted to repeat the analysis, but leaving out the non-protein-coding genes, which I wanted to achieve by introducing the "biotype" and filtering by "protein-coding" in the exon.annotation object.
> To check the analysis i performed it again like some months before and I got an error message, which I do not understand completely:
> 
> 
> OUTPUT:
> 
>> head( exon.annotation)
>  ensembl_gene_id strand ensembl_transcript_id chromosome_name
> 1   GRMZM2G439951      1     GRMZM2G439951_T01               5
> 2   GRMZM2G094632      1     GRMZM2G094632_T02               5
> 3   GRMZM2G094632      1     GRMZM2G094632_T01               5
> 4   GRMZM2G094632      1     GRMZM2G094632_T01               5
> 5   GRMZM2G095090      1     GRMZM2G095090_T01               5
> 6   GRMZM2G095094      1     GRMZM2G095094_T01               5
>  ensembl_gene_id.1   ensembl_exon_id exon_chrom_start exon_chrom_end
> 1     GRMZM2G439951 GRMZM2G439951_E01         39542856       39544325
> 2     GRMZM2G094632 GRMZM2G094632_E01         39435878       39436448
> 3     GRMZM2G094632 GRMZM2G094632_E03         39435878       39436317
> 4     GRMZM2G094632 GRMZM2G094632_E02         39436730       39437134
> 5     GRMZM2G095090 GRMZM2G095090_E01         39427195       39427706
> 6     GRMZM2G095094 GRMZM2G095094_E01         39424253       39427105
>    gene_biotype chromosome
> 1 protein_coding       chr5
> 2 protein_coding       chr5
> 3 protein_coding       chr5
> 4 protein_coding       chr5
> 5 protein_coding       chr5
> 6 protein_coding       chr5
> 
>> head( exon.range)
> RangedData with 6 rows and 5 value columns across 10 spaces
>     space                 ranges |    strand        transcript
>  <factor>              <IRanges> | <integer>       <character>
> 1     chr1 [301409811, 301409969] |        -1 AC185612.3_FGT006
> 2     chr1 [301362751, 301363304] |        -1 GRMZM5G884466_T03
> 3     chr1 [301353831, 301354022] |        -1 GRMZM5G884466_T03
> 4     chr1 [301353664, 301353745] |        -1 GRMZM5G884466_T03
> 5     chr1 [301353366, 301353557] |        -1 GRMZM5G884466_T03
> 6     chr1 [301353147, 301353260] |        -1 GRMZM5G884466_T03
>              gene                   exon        biotype
>       <character>            <character>    <character>
> 1 AC185612.3_FG006 AC185612.3_FG006.exon1 protein_coding
> 2    GRMZM5G884466      GRMZM5G884466_E12 protein_coding
> 3    GRMZM5G884466      GRMZM5G884466_E03 protein_coding
> 4    GRMZM5G884466      GRMZM5G884466_E04 protein_coding
> 5    GRMZM5G884466      GRMZM5G884466_E08 protein_coding
> 6    GRMZM5G884466      GRMZM5G884466_E10 protein_coding
>> 
> 
>> rnaSeq_new <- easyRNASeq(filesDirectory="/projects/irg/grp_stich/personal_folders/Felix/TranscriptomeProfiling/RNASeq/RNASeq/tuxedo/tophat/EASYRNASEQ/countdata",
> +                           gapped=F,
> +                           validity.check=TRUE,
> +                           chr.map=chr.map,
> +                           organism="custom",
> +                           annotationMethod="env",
> +                           annotationObject=exon.range,
> +                           count="genes",
> +                           filenames=files,
> +                           summarization="geneModels",
> +                           outputFormat="RNAseq")
> Checking arguments...
> Fetching annotations...
> Computing gene models...
> Summarizing counts...
> Processing accepted_hits_12a_sorted.bam
> Updating the read length information.
> The reads are of 95 bp.
> Warning message:
> In easyRNASeq(filesDirectory = "/projects/irg/grp_stich/personal_folders/Felix/TranscriptomeProfiling/RNASeq/RNASeq/tuxedo/tophat/EASYRNASEQ/countdata",  :
>  There are 8770 synthetic exons as determined from your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want?
> Error in unlist(aggregate(readCoverage(obj)[names(geneModel(obj))[gm.sel]],  :
>  error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in .Call2("Rle_getStartEndRunAndOffset", x, start, end, PACKAGE = "IRanges") :
>  'x' values larger than vector length 'sum(width)'
>> 
> 
> 
> 
> 
> 
> Could you help me with this problem?
> 
> Best,
> Felix
> 
> --
> Felix Frey
> 
> Max Planck Institut für Pflanzenzüchtungsforschung
> Carl-von-Linné-Weg 10
> D-50829 Köln
> +49 (0) 221-5062 405
> 
> -- output of sessionInfo(): 
> 
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] easyRNASeq_1.6.0       ShortRead_1.18.0       latticeExtra_0.6-26
> [4] RColorBrewer_1.0-5     Rsamtools_1.12.4       DESeq_1.12.1
> [7] lattice_0.20-15        locfit_1.5-9.1         BSgenome_1.28.0
> [10] GenomicRanges_1.12.5   Biostrings_2.28.0      IRanges_1.18.3
> [13] edgeR_3.2.4            limma_3.16.7           biomaRt_2.16.0
> [16] Biobase_2.20.1         genomeIntervals_1.16.0 BiocGenerics_0.6.0
> [19] intervals_0.14.0
> 
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.22.6 DBI_0.2-7            RCurl_1.95-4.1
> [4] RSQLite_0.11.4       XML_3.98-1.1         annotate_1.38.0
> [7] bitops_1.0-6         genefilter_1.42.0    geneplotter_1.38.0
> [10] grid_3.0.1           hwriter_1.3          splines_3.0.1
> [13] stats4_3.0.1         survival_2.37-4      tools_3.0.1
> [16] xtable_1.7-1         zlibbioc_1.6.0
>> 
> 
> 
> --
> Sent via the guest posting facility at bioconductor.org.