[BioC] DEXSeq package - read.HTSeqCount function error

Tue Jan 29 18:49:03 CET 2013

Dear Matteo Carrara,

Could you add to this e-mail the first 10 and last 10 lines of your 
input files (both counts and annotation files produced by the python 
scripts)?

On unrelated topics, I noticed that you have only one sample, what 
exactly do you want to do with DEXSeq in this case?
Note that DEXSeq is designed to test for differences in exon usage 
between different conditions with replicates.

Best wishes,
Alejandro Reyes

> Hello,
>
> I have been trying to learn how to perform a differential expression
> analysis of RNA-seq data using the DEXSeq package lately and I encountered
> an unexpected behaviour in the function read.HTSeqCount: the function fails
> to load the file obtained from the python script "dexseq_count.py" with the
> following error message:
>
> Error in strsplit(rownames(dcounts), ":") : non-character argument
>
> I would really appreciate any pointers that might help me correct my code
> or my input files.
>
> Here is what I have done:
> - downloaded the mm9 GTF gene set from www.ensembl.org and run the script
> "dexseq_prepare_annotation.py"
> - mapped my raw RNA-seq reads on the mm9 genome using tophat, converting
> the output in sorted SAM format
> - run the script "dexseq_count.py" using the "flattened" GTF and the SAM
> file obtained before
> - loaded the dataset in R using the function read.HTSeqCount() as following:
>
> --------------------------------------
>> library(DEXSeq)
>> wt<-read.HTSeqCount("./wt_mapped.counts", "WT",
> flattenedfile="./flattened_mm9.gtf")
>
>
> Error in strsplit(rownames(dcounts), ":") : non-character argument
> --------------------------------------
>
> As far as I could understand, the "pasilla" package, used for the examples
> in the vignette, provided a counts file under the name
> "pasilla_gene_counts.tsv". Loading that file, however, results in the same
> error message.
>
> All I could do was pinpointing the source of the error in the code of the
> function, although that did not help me in finding a solution or a
> workaround:
> After creating the data frame "dcounts" storing the counts and setting the
> row names, that same data frame is sub-set
>
>      dcounts <- dcounts[substr(rownames(dcounts), 1, 1) != "_",
>          ]
>
> This code, however changes the object dcounts in such a way that the
> "rownames()" function returns NULL. The next statement is then bound to
> fail since it requires rownames(dcounts) to be a character or a vector of
> characters:
>
>      genesrle <- sapply(strsplit(rownames(dcounts), ":"), "[[",
>          1)
>
> I am running R 2.15.2  and DEXSeq  1.4.0 from Bioconductor version 2.11,
> but I was able to reproduce this on the devel version of R (2013-01-22
> r61734) using DEXSeq_1.5.6 from Bioconductor version 2.12.
>
> ---------------------------------
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] BiocInstaller_1.8.3 DEXSeq_1.4.0        Biobase_2.18.0
> [4] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.14.0 hwriter_1.3    RCurl_1.95-3   statmod_1.4.16
> stringr_0.6.2
> [6] tools_2.15.2   XML_3.95-0.1
> --------------------------------
>
> Thank you in advance for any help you can provide.
> Best Regards,