[BioC] merging two sets of genes
Robert Gentleman
rgentlem at fhcrc.org
Tue Dec 27 23:40:30 CET 2005
Hi,
I think that the problem is that the arrays are not the same - and
then life is much harder. There are some papers on it (G. Parmigiani et
al have produced MergeMaid, as one option). I have done some work on
this problem, with Wolfgang Huber and Markus Rauschaupt (you can find
the technical report under the Bioconductor publications link - I hope).
It is not so simple to match across different arrays, where different
probes were used (you can take the expedient of mapping to some common
set of IDs and matching on those, some code in packages GeneMeta and
GeneMetaEx, if I recall correctly), but just because they map to the
same Entrez gene id (for example) does not mean that the same thing was
measured - whence MergeMaid and similar tools.
And if this is correct, then combining them is contra-indicated and
some of the tools for synthesizing experiments, such as meta-analysis or
the more general random effects models will be needed. Just because you
can jam, either the raw data or the processed data together, does not
mean that it is sensible to do so.
And finally, even if the arrays are identical, unless they were all
essentially done at the same time under very similar conditions I would
still take the approach in the paragraph above and use a random effects
model.
best wishes
Robert
Seth Falcon wrote:
> On 26 Dec 2005, kfbargad at ehu.es wrote:
>
>
>>Dear list,
>>
>>I have two sets of genes from the same experiment,
>>
>>
>>>PinC
>>
>>Expression Set (exprSet) with
>>1310 genes
>>8 samples
>>phenoData object with 2 variables and 8 cases
>>varLabels
>>FileName: read from file
>>Target: read from file
>>
>>>PinS
>>
>>Expression Set (exprSet) with
>>2891 genes
>>8 samples
>>phenoData object with 2 variables and 8 cases
>>varLabels
>>FileName: read from file
>>Target: read from file
>>
>>
>>How can I merge these two sets? I tried union() on two vectors
>>created from the probe IDs but failed. Any hints?
>
>
> One approach would be to create a new exprSet object manually using
> the data from PinC and PinS. Basically, create a new phenoData object
> with the data for all 16 cases, and a new epxression matrix with 16
> columns (assuming the two original exprSets represent disjoint sets of
> samples).
>
> Thinking out loud, is this a common enough operation to warrant a
> method for exprSets? I could imagine c() being defined on exprSets
> such that if the phenoData columns are the same and the "sample ids"
> as given by the rownames of phenoData/colnames of exprs are disjoint,
> then do the obvious thing, else error.
>
> + seth
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list