[BioC] DEseq for sample clustering

Wolfgang Huber whuber at embl.de
Thu Nov 10 23:57:59 CET 2011


Dear Linn

each sample's data corresponds to a column in the data matrix of a
countDataSet, and it seems that your question boils down on how to
1. subset columns of a matrix
2. compute average vector from sets of columns of a matrix.

For 1., you can do something like

library(Biobase)
library(pasilla)
data(pasillaGenes)
s1 = pasillaGenes[, pasillaGenes$type=="single-read"]

For 2., see the 'ave' function in the 'stats' package, or more pedestrian:

sp = with(pData(pasillaGenes),
        split(seq(along=condition), condition))
mn = do.call(cbind,
   lapply(sp, function(i)
      rowMeans(vsd[,i,drop=FALSE])))

where 'vsd' is the data after variance stabilising transformation as 
described in the vignette.

	Best wishes
	Wolfgang


Nov/10/11 1:45 PM, Linn Fagerberg [guest] scripsit::
>
> I have used the functions described in the DEseq package information
> for clustering and heatmap visualization of RNA-seq data with great
> results. However I am a bit confused whether I may be able to use the
> conds argument for my count dataset. When I have replicate samples I
> would like to get only the ones specified in the conds vector as the
> nodes in the dendrogram of the heatmap. Is this possible to do using
> methods in the DEseq package or do I have to calculate average values
> for the replicates manually before I obtain the distances?
>
> -- output of sessionInfo():
>
>
>
> -- Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list