[BioC] Error Using DESeq with HTSeq-Count
Devon Ryan
dpryan at dpryan.com
Mon Jan 27 16:28:14 CET 2014
Hi Veronica,
The SAM file optionally output by htseq-count is mostly for debugging. You need, instead, to load the counts that are printed to the screen.
If your original command was something of the form:
samtools view alignments.bam | htseq-count -o alignment.htseq.sam - something.gff
then simply do instead:
samtools view alignments.bam | htseq-count - something.gff > alignment.counts
The alignment.counts file would then be appropriate for loading into R.
@Michael et alii, maybe you guys could update the htseq webpage and such to make this more explicit. I've seen a number of people (mostly on seqanswers) have this same misunderstanding.
Regards,
Devon
____________________________________________
Devon Ryan, Ph.D.
Email: dpryan at dpryan.com
Tel: +49 (0)178 298-6067
Molecular and Cellular Cognition Lab
German Centre for Neurodegenerative Diseases (DZNE)
Ludwig-Erhard-Allee 2
53175 Bonn, Germany
On Jan 27, 2014, at 4:04 PM, Xiaoyu Liang wrote:
> Hi Mike,
>
> Thank you for the respond, sorry I didn't include those information.
>
> I pasted 3 lines from the HTSeq-count output sam file
>
> PLATYPUS_627RLAAXX:4:001:01065:10528 163 chrX 23803314
> 50 51M = 23803885 622
> GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA
> T`cccc^YacbL`TTTa\TTbbbYYcL\c`^cYcc_c^`cc]b]Y\ccbY^ AS:i:0 XN:i:0
> XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1 XF:Z:no_feature
> PLATYPUS_627RLAAXX:4:001:01065:11031 97 chr12 56552177
> 50 12M279N28M875N11M = 56553900 1868
> GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG
> cc`\\dd^dT^ccca`Y`]YYa```TccTab\YL_a`YK`]LK\_]U]]UY AS:i:0 XM:i:0
> XO:i:0 XG:i:0 MD:Z:51 NM:i:0 XS:A:+ NH:i:1 XF:Z:MYL6
> PLATYPUS_627RLAAXX:4:001:01065:11031 145 chr12 56553900
> 50 33M94N18M = 56552177 -1868
> GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG
> a_\L`]TaTTbZU[LZTaRYaTYb`YIZSTZ_]UYQS]]\\`T[``TbT^b AS:i:-6 XM:i:1
> XO:i:0 XG:i:0 MD:Z:26A24 NM:i:1 XS:A:+ NH:i:1 XF:Z:MYL6
>
> In the R package,
> I have a data frame looks like:
>> table
> samplename filename condition
> 1 NML-1 NML-1-htout NML
> 2 NML-2 NML-2-htout NML
> 3 LMP-1 LMP-1-htout LMP
> 4 LMP-2 LMP-2-htout LMP
>
> Then I call the following in R
> cds = newCountDataSetFromHTSeqCount(table, directory=".")
>
>
> Veronica
>
>
> On Mon, Jan 27, 2014 at 9:39 AM, Michael Love
> <michaelisaiahlove at gmail.com>wrote:
>
>> Hi Veronica,
>>
>> Could you paste the head of the htseq count table file? Maybe there is
>> some clue as to what is going wrong.
>>
>> Also it's a good idea to include all your code (command line and R) to
>> help package maintainers diagnose what might be going on.
>>
>> Mike
>> On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu at gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was trying to use DESeq with the count table obtained from HTSeq-count.
>>> When I used the function "newCountDataSetFromHTSeqCount", it gave me an
>>> error complaining some lines of the count table do not have 24 columns.
>>>
>>> The error message looks like:
>>>
>>> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>>> na.strings, :
>>> line 5 did not have 24 elements"
>>>
>>> I checked the HTSeq-count table results, not only a couple of line don't
>>> have 24 columns, most of them don't have. So I can't skip those lines.
>>>
>>> Is there anything wrong with HTSeq-count results? Would anybody give me
>>> any
>>> suggestions?
>>>
>>> Thank you in advance,
>>> Veronica
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list