[BioC] merging two sets of genes

Tue Dec 27 23:40:30 CET 2005

Hi,
  I think that the problem is that the arrays are not the same - and 
then life is much harder. There are some papers on it (G. Parmigiani et 
al have produced MergeMaid, as one option). I have done some work on 
this problem, with Wolfgang Huber and Markus Rauschaupt (you can find 
the technical report under the Bioconductor publications link - I hope).
  It is not so simple to match across different arrays, where different 
probes were used (you can take the expedient of mapping to some common 
set of IDs and matching on those, some code in packages GeneMeta and 
GeneMetaEx, if I recall correctly), but just because they map to the 
same Entrez gene id (for example) does not mean that the same thing was 
measured - whence MergeMaid and similar tools.

  And if this is correct, then combining them is contra-indicated and 
some of the tools for synthesizing experiments, such as meta-analysis or 
the more general random effects models will be needed. Just because you 
can jam, either the raw data or the processed data together, does not 
mean that it is sensible to do so.

And finally, even if the arrays are identical, unless they were all 
essentially done at the same time under very similar conditions I would 
still take the approach in the paragraph above and use a random effects 
model.

  best wishes
    Robert

Seth Falcon wrote:
> On 26 Dec 2005, kfbargad at ehu.es wrote:
> 
> 
>>Dear list,
>>
>>I have two sets of genes from the same experiment,
>>
>>
>>>PinC
>>
>>Expression Set (exprSet) with 
>>1310 genes
>>8 samples
>>phenoData object with 2 variables and 8 cases
>>varLabels
>>FileName: read from file
>>Target: read from file
>>
>>>PinS
>>
>>Expression Set (exprSet) with 
>>2891 genes
>>8 samples
>>phenoData object with 2 variables and 8 cases
>>varLabels
>>FileName: read from file
>>Target: read from file
>>
>>
>>How can I merge these two sets? I tried union() on two vectors
>>created from the probe IDs but failed. Any hints?
> 
> 
> One approach would be to create a new exprSet object manually using
> the data from PinC and PinS.  Basically, create a new phenoData object
> with the data for all 16 cases, and a new epxression matrix with 16
> columns (assuming the two original exprSets represent disjoint sets of
> samples).
> 
> Thinking out loud, is this a common enough operation to warrant a
> method for exprSets?  I could imagine c() being defined on exprSets
> such that if the phenoData columns are the same and the "sample ids"
> as given by the rownames of phenoData/colnames of exprs are disjoint,
> then do the obvious thing, else error.
> 
> + seth
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org