[BioC] DEXSeq for 1 gene

Tue May 28 09:20:22 CEST 2013

Hi Wolfgang
Although I have 20 samples, only 11 are under the conditions I'm
interested in this analysis. Not sure why I mentioned 20 in the first
place sorry. Of those 11, 2 are giving odd results from top hat so I
haven't included them yet until I figure out what is wrong. So I have 9 to
work with

> counts(ecs)
K61 K62 K63 K83 Q61 Q62 Q63 Q81 Q82
XLOC_000001:E001 1 0 1 0 1 0 0 0 0
XLOC_000001:E002 1 0 0 0 2 1 0 1 1
XLOC_000001:E003 0 0 0 0 3 0 0 0 0
XLOC_000001:E004 1 0 1 0 1 0 0 1 0
XLOC_000001:E005 1 3 3 2 3 0 0 0 0
XLOC_000001:E006 0 0 0 0 0 0 0 0 0
XLOC_000001:E007 0 1 2 2 2 0 3 2 0
XLOC_000001:E008 2 1 2 0 0 2 0 1 0
XLOC_000001:E009 1 2 0 0 1 0 0 2 0
XLOC_000001:E010 4 0 1 1 0 0 0 1 0
XLOC_000001:E011 0 0 0 0 0 1 0 0 0
XLOC_000001:E012 0 0 1 1 0 2 0 2 0
XLOC_000001:E013 1 1 3 2 2 2 1 3 0
XLOC_000001:E014 3 3 1 2 2 1 3 0 1
XLOC_000001:E015 1 2 1 2 4 2 0 0 3
XLOC_000001:E016 3 5 2 2 7 1 0 5 0
XLOC_000001:E017 1 2 0 2 2 2 1 3 1
XLOC_000001:E018 0 4 3 1 6 3 2 2 1
XLOC_000001:E019 2 2 3 2 2 2 0 5 3
XLOC_000001:E020 3 3 2 0 3 1 2 4 2
XLOC_000001:E021 2 1 2 4 3 1 0 3 1
XLOC_000001:E022 2 0 1 4 3 1 0 2 1
XLOC_000001:E023 0 0 3 1 0 2 0 2 1
XLOC_000001:E024 3 1 0 1 0 0 0 1 1
XLOC_000001:E025 0 1 0 2 2 2 2 0 0
XLOC_000001:E026 0 0 2 2 1 2 0 1 0
XLOC_000001:E027 6 1 2 3 0 0 0 4 1
XLOC_000001:E028 7 2 3 3 0 4 1 5 0
XLOC_000001:E029 5 4 1 5 0 5 5 3 1
XLOC_000001:E030 0 1 1 0 3 0 0 0 0
XLOC_000001:E031 1 1 0 0 1 0 0 1 0

The low read count I put down to only looking at a single gene. Is my data
a loss cost for the dexseq pipeline?

Thanks for your help

Eamonn

Dr Eamonn Mallon
Lecturer in Evolutionary Biology
Adrian 220
Biology Department
University of Leicester

http://www2.le.ac.uk/departments/biology/people/mallon

On 26/05/2013 13:12, "Wolfgang Huber" <whuber at embl.de> wrote:

>Dear Eamonn
>
>from your initial post -i.e. the output of head(counts(ecs))- it looks
>like your counts are very low? This would make estimateSizeFactors
>produce lots of NA values, but even if it did not, it would mean that
>you'd have essentially no statistical power to detect anything useful
>from such low counts.
>
>You mentioned "20 bumblebee samples" but the table whose head you show
>only has 9?
>
>Can you post the full output of 'counts(ecs)' and make sure its content
>is plausible based on your understanding of the data?
>
>	Best wishes
>	Wolfgang
>
> 
>On May 23, 2013, at 2:03 pm, Eamonn Mallon <ebm3 at leicester.ac.uk> wrote:
>
>> Hi Alejandro,
>> Thanks very much for getting back to me.
>> 
>> I think I got the second parameter idea from DESeq. I see I am wrong.
>>so I ran
>> 
>> > ecs<-estimateSizeFactors(ecs)
>> > sizeFactors(ecs)
>> K61 K62 K63 K83 Q61 Q62 Q63 Q81 Q82
>> NA  NA  NA  NA  NA  NA  NA  NA  NA
>> 
>> Is there any way to get DEXSeq to estimate size factors with my small
>>dataset?
>> 
>> If you are only interested in a single gene maybe you could visualize
>> directly the counts per exon using the function plotDEXSeq.
>> 
>> I'm not sure how to do this I tried
>> 
>> >plotDEXSeq(ecs,"XLOC_000001", fitExpToVar="condition",
>>norCounts=FALSE, expression=FALSE, splicing=FALSE,
>>displayTranscripts=FALSE, legend=TRUE)
>> 
>> Error in plotDEXSeq(ecs, "XLOC_000001", fitExpToVar = "condition",
>>norCounts = FALSE,  :
>>  Please estimate sizeFactors first
>> 
>> I guess this is because modelFrameForGene requires sizefactors.
>> 
>> Any ideas?
>> 
>> Eamonn
>> 
>> 
>> On 22/05/13 18:07, Alejandro Reyes wrote:
>>> Dear Eamonn,
>>> 
>>> Thanks for your interest in DEXSeq.
>>> 
>>> In your size factor estimation, where did you get your second parameter
>>> from? (check ?estimateSizeFactors) There is no need to specify anything
>>> else than the ExonCountSet object and that is the reason you are
>>>getting
>>> the error message.
>>> 
>>> If you are only interested in a single gene maybe you could visualize
>>> directly the counts per exon using the function plotDEXSeq.
>>> 
>>> Best regards,
>>> Alejandro Reyes
>>> 
>>> 
>>>> Dear All,
>>>> I have RNA-seq data for 20 bumblebee samples divided into treatment
>>>>and control. I am interested in differential exon usage. Unfortunately
>>>>the bumblebee genome is not at the stage where I can do a complete
>>>>DEXSeq analysis genome wide. I decided to look at a single gene where
>>>>I have a decent gene model. Using TopHat2 and the python scripts
>>>>included in DEXSeq I was able to produce an ExonCountSet object.
>>>> 
>>>>> head(counts(ecs))
>>>>                   K61 K62 K63 K83 Q61 Q62 Q63 Q81 Q82
>>>> XLOC_000001:E001   1   0   1   0   1   0   0   0   0
>>>> XLOC_000001:E002   1   0   0   0   2   1   0   1   1
>>>> XLOC_000001:E003   0   0   0   0   3   0   0   0   0
>>>> XLOC_000001:E004   1   0   1   0   1   0   0   1   0
>>>> XLOC_000001:E005   1   3   3   2   3   0   0   0   0
>>>> XLOC_000001:E006   0   0   0   0   0   0   0   0   0
>>>> 
>>>> As I expected, its all come crashing down round my ears at the
>>>>analysis stage
>>>> 
>>>>> sizeFactors(ecs)
>>>> K61 K62 K63 K83 Q61 Q62 Q63 Q81 Q82
>>>>   NA  NA  NA  NA  NA  NA  NA  NA  NA
>>>> 
>>>> I tried using the shorth value
>>>> 
>>>>> 
>>>>>ecs<-estimateSizeFactors(ecs,(locfunc=genefilter::shorth((counts(ecs))
>>>>>,tie.action="min")))
>>>> Error in .local(object, ...) : unused argument (0.2)
>>>> 
>>>> I understand that my main problem is that the DEXSeq analysis should
>>>>be genome wide (my data has too few counts).
>>>> 
>>>> Is there a way to use DEXSeq to ONLY look at a single gene? If not
>>>>can anyone think of a statistical way to analyse my data (cross tabs?
>>>>How should I normalise?)
>>>> 
>>>> Thanks for your time
>>>> 
>>>> Eamonn
>>>> 
>>>> 
>>>>> sessionInfo()
>>>> R version 3.0.0 (2013-04-03)
>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>> 
>>>> locale:
>>>> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>>>> 
>>>> attached base packages:
>>>> [1] parallel  stats     graphics  grDevices utils     datasets
>>>>methods   base
>>>> 
>>>> other attached packages:
>>>> [1] genefilter_1.42.0  DEXSeq_1.6.0       Biobase_2.20.0
>>>>BiocGenerics_0.6.0
>>>> 
>>>> loaded via a namespace (and not attached):
>>>>   [1] annotate_1.38.0      AnnotationDbi_1.22.5 biomaRt_2.16.0
>>>>Biostrings_2.28.0    bitops_1.0-5         DBI_0.2-7
>>>>   [7] GenomicRanges_1.12.3 hwriter_1.3          IRanges_1.18.1
>>>>RCurl_1.95-4.1       Rsamtools_1.12.3     RSQLite_0.11.3
>>>> [13] splines_3.0.0        statmod_1.4.17       stats4_3.0.0
>>>>stringr_0.6.2        survival_2.37-4      tools_3.0.0
>>>> [19] XML_3.95-0.2         xtable_1.7-1         zlibbioc_1.6.0
>>>> Dr Eamonn Mallon
>>>> Lecturer in Evolutionary Biology
>>>> Adrian 220
>>>> Biology Department
>>>> University of Leicester
>>>> 
>>>> http://www2.le.ac.uk/departments/biology/people/mallon
>>>> 
>>>> 
>>>> 	[[alternative HTML version deleted]]
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>