[BioC] DEseq for sample clustering
Wolfgang Huber
whuber at embl.de
Thu Nov 10 23:57:59 CET 2011
Dear Linn
each sample's data corresponds to a column in the data matrix of a
countDataSet, and it seems that your question boils down on how to
1. subset columns of a matrix
2. compute average vector from sets of columns of a matrix.
For 1., you can do something like
library(Biobase)
library(pasilla)
data(pasillaGenes)
s1 = pasillaGenes[, pasillaGenes$type=="single-read"]
For 2., see the 'ave' function in the 'stats' package, or more pedestrian:
sp = with(pData(pasillaGenes),
split(seq(along=condition), condition))
mn = do.call(cbind,
lapply(sp, function(i)
rowMeans(vsd[,i,drop=FALSE])))
where 'vsd' is the data after variance stabilising transformation as
described in the vignette.
Best wishes
Wolfgang
Nov/10/11 1:45 PM, Linn Fagerberg [guest] scripsit::
>
> I have used the functions described in the DEseq package information
> for clustering and heatmap visualization of RNA-seq data with great
> results. However I am a bit confused whether I may be able to use the
> conds argument for my count dataset. When I have replicate samples I
> would like to get only the ones specified in the conds vector as the
> nodes in the dendrogram of the heatmap. Is this possible to do using
> methods in the DEseq package or do I have to calculate average values
> for the replicates manually before I obtain the distances?
>
> -- output of sessionInfo():
>
>
>
> -- Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
More information about the Bioconductor
mailing list