[BioC] merging two sets of genes

Robert Gentleman rgentlem at fhcrc.org
Wed Dec 28 14:01:01 CET 2005


Hi,
  thanks for the clarification. Then it depends on whether you want to 
use the union or the intersection of the probes you selected in the two 
different ways.
union and intersect, applied to geneNames(PinS) and geneNames of PinC 
should get you somewhere close, you might also want to consider match 
and %in%, depending on just how you want to select.
After that, you will need to create a matrix with the combined 
expressions and use that as input in a call to
  new, the vignettes for Biobase should demonstrate how to make an 
exprSet from a matrix, but please ask if anything is not clear

best wishes
   Robert

kfbargad at ehu.es wrote:
> Dear Seth and Robert,
> 
> I apologise, but I didn´t make myself clear. 
> 
> PinS and PinC come from the same experiment, i.e. the same eset. It is 
> just that I followed two different approaches to the analysis and now 
> I want to continue working with the union of these two lists. So I am 
> not intending to match across different arrays.
> 
> Hope this explains my question
> 
> David
> 
> 
>>Hi,
>>  I think that the problem is that the arrays are not the same - and 
>>then life is much harder. There are some papers on it (G. Parmigiani 
> 
> et 
> 
>>al have produced MergeMaid, as one option). I have done some work on 
>>this problem, with Wolfgang Huber and Markus Rauschaupt (you can 
> 
> find 
> 
>>the technical report under the Bioconductor publications link - I 
> 
> hope).
> 
>>  It is not so simple to match across different arrays, where 
> 
> different 
> 
>>probes were used (you can take the expedient of mapping to some 
> 
> common 
> 
>>set of IDs and matching on those, some code in packages GeneMeta and 
>>GeneMetaEx, if I recall correctly), but just because they map to the 
>>same Entrez gene id (for example) does not mean that the same thing 
> 
> was 
> 
>>measured - whence MergeMaid and similar tools.
>>
>>  And if this is correct, then combining them is contra-indicated 
> 
> and 
> 
>>some of the tools for synthesizing experiments, such as meta-
> 
> analysis or 
> 
>>the more general random effects models will be needed. Just because 
> 
> you 
> 
>>can jam, either the raw data or the processed data together, does 
> 
> not 
> 
>>mean that it is sensible to do so.
>>
>>And finally, even if the arrays are identical, unless they were all 
>>essentially done at the same time under very similar conditions I 
> 
> would 
> 
>>still take the approach in the paragraph above and use a random 
> 
> effects 
> 
>>model.
>>
>>  best wishes
>>    Robert
>>
>>
>>Seth Falcon wrote:
>>
>>>On 26 Dec 2005, kfbargad at ehu.es wrote:
>>>
>>>
>>>
>>>>Dear list,
>>>>
>>>>I have two sets of genes from the same experiment,
>>>>
>>>>
>>>>
>>>>>PinC
>>>>
>>>>Expression Set (exprSet) with 
>>>>1310 genes
>>>>8 samples
>>>>phenoData object with 2 variables and 8 cases
>>>>varLabels
>>>>FileName: read from file
>>>>Target: read from file
>>>>
>>>>
>>>>>PinS
>>>>
>>>>Expression Set (exprSet) with 
>>>>2891 genes
>>>>8 samples
>>>>phenoData object with 2 variables and 8 cases
>>>>varLabels
>>>>FileName: read from file
>>>>Target: read from file
>>>>
>>>>
>>>>How can I merge these two sets? I tried union() on two vectors
>>>>created from the probe IDs but failed. Any hints?
>>>
>>>
>>>One approach would be to create a new exprSet object manually using
>>>the data from PinC and PinS.  Basically, create a new phenoData 
> 
> object
> 
>>>with the data for all 16 cases, and a new epxression matrix with 16
>>>columns (assuming the two original exprSets represent disjoint 
> 
> sets of
> 
>>>samples).
>>>
>>>Thinking out loud, is this a common enough operation to warrant a
>>>method for exprSets?  I could imagine c() being defined on exprSets
>>>such that if the phenoData columns are the same and the "sample 
> 
> ids"
> 
>>>as given by the rownames of phenoData/colnames of exprs are 
> 
> disjoint,
> 
>>>then do the obvious thing, else error.
>>>
>>>+ seth
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>
>>
>>-- 
>>Robert Gentleman, PhD
>>Program in Computational Biology
>>Division of Public Health Sciences
>>Fred Hutchinson Cancer Research Center
>>1100 Fairview Ave. N, M2-B876
>>PO Box 19024
>>Seattle, Washington 98109-1024
>>206-667-7700
>>rgentlem at fhcrc.org
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
> 
> 
> 
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list