[BioC] use of voom function with attract package

Fri Apr 25 01:25:32 CEST 2014

Hi Emmanouela,

I don't know of an off-the-shelf function to do something similar to 
findSynexprs but using precision weights.  I guess one would do 
hiearchical clustering of the genes, and would define the distance measure 
to be a Pearson correlation calculation but using weights.  Not hard to 
do, but not yet ready off-the-shelf as far as I know.

Best wishes
Gordon

On Thu, 24 Apr 2014, Emmanouela Repapi wrote:

> Dear Gordon and Ryan,
>
> Thank you both very much for your replies, using the F statistic from 
> the lmFit and the geneSetTest seems like the right way to go for what I 
> need to do.

> On a similar note, do you have a function in mind for finding the genes 
> with similar expression profiles within a specific pathway (as a 
> replacement of the other function of attract, findSynexprs)?
>
> Thank you again for your help.
>
> Best wishes,
> Emmanouela
>
> On 18 Apr 2014, at 02:16, Gordon K Smyth <smyth at wehi.EDU.AU> wrote:
>
>> Dear Emmanouela,
>>
>> The limma package is designed to fit linear models, and it can compute t-statistics and F-statistics faster than making your own loop to lm(). If you want F-statistics for distinguishing the cell types, why not:
>>
>>  fit <- lmFit(anal_voom, design)
>>  fit <- eBayes(fit[,-1])
>>
>> Then the F-statistics will be in fit$F.
>>
>> If you want to know whether a particular KEGG pathway tends to have larger F-statistics, you could also use:
>>
>>  geneSetTest(index, fit$F)
>>
>> where index selects genes in the pathway.  If there are only two cell types, a better way would be:
>>
>>  camera(anal_voom, index, design)
>>
>> With camera, index could be a list of index vectors for all the KEGG pathways at once.
>>
>> Best wishes
>> Gordon
>>
>>> Date: Tue, 15 Apr 2014 09:44:42 -0700 (PDT)
>>> From: "Emmanouela Repapi [guest]" <guest at bioconductor.org>
>>> To: bioconductor at r-project.org, emmanouela.repapi at imm.ox.ac.uk
>>> Subject: [BioC] use of voom function with attract package
>>>
>>>
>>> Dear Bioconductor,
>>>
>>> I am trying to use the attract package to find the processes that are differentially activated between cell types of the same lineage, using RNA-Seq data. Since the attract package is designed to work with microarray data, I decided to use the voom function to transform my data and change the findAttractors() function accordingly, to accommodate this type of data. Since this is not trivial, I want to make sure that I am using the output from the voom function correctly.
>>>
>>> The main part of the findAttractors() uses lm to model the expression in relation to the cell type (group) and then an anova to get the F statistic for each gene:
>>>   fstat <- apply(dat.detect.wkegg, 1, function(y, x) {
>>>       anova(lm(y ~ x))[[4]][1]}, x = group)
>>> where dat.detect.wkegg is the matrix of the normalized expression values with the genes per row and the samples per column.
>>> (To give some more context, the function then uses the log2 values of the fstat and does a t test between the gene values of a specific pathway vs all the gene values to identify the significant pathways. )
>>>
>>> What I want to do is change the above to:
>>>
>>> counts_data <- DGEList(counts=rnaseq,group=celltype)
>>> counts_data_norm <- calcNormFactors(counts_data)
>>>
>>> design <- model.matrix( ~ celltype)
>>> anal_voom <- voom(counts_data_norm, design)
>>> dat.detect.wkegg <- as.list(as.data.frame(t(anal_voom$E)))
>>> voom_weights <- as.list(as.data.frame(t(anal_voom$weights)))
>>>
>>> fstat <- mapply(function(y, w, group) {anova(lm(y ~ group, weights=w))[[4]][1]},
>>> 	dat.detect.wkegg, voom_weights, MoreArgs = list(group=celltype))
>>>
>>> Is this the way to go with using the weights from voom, or am I getting this very wrong?
>>>
>>> Many thanks in advance for your reply!
>>>
>>> Best wishes,
>>> Emmanouela
>>>
>>>
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>> sessionInfo()
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_GB.ISO-8859-1       LC_NUMERIC=C                    LC_TIME=en_GB.ISO-8859-1        LC_COLLATE=en_GB.ISO-8859-1     LC_MONETARY=en_GB.ISO-8859-1    LC_MESSAGES=en_GB.ISO-8859-1
>>> [7] LC_PAPER=C                      LC_NAME=C                       LC_ADDRESS=C                    LC_TELEPHONE=C                  LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] attract_1.14.0       GOstats_2.28.0       graph_1.40.1         Category_2.28.0      GO.db_2.10.1         Matrix_1.1-3         cluster_1.15.2       annotate_1.40.1      org.Mm.eg.db_2.10.1
>>> [10] KEGG.db_2.10.1       RSQLite_0.11.4       DBI_0.2-7            AnnotationDbi_1.24.0 Biobase_2.22.0       BiocGenerics_0.8.0   edgeR_3.4.2          limma_3.18.13
>>>
>>> loaded via a namespace (and not attached):
>>> [1] AnnotationForge_1.4.4 genefilter_1.44.0     grid_3.0.1            GSEABase_1.24.0       IRanges_1.20.7        lattice_0.20-29       RBGL_1.38.0           splines_3.0.1
>>> [9] stats4_3.0.1          survival_2.37-7       tcltk_3.0.1           tools_3.0.1           XML_3.98-1.1          xtable_1.7-3
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the addressee.
>> You must not disclose, forward, print or use it without the permission of the sender.
>> ______________________________________________________________________
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}