[Bioc-devel] oligo package, SNP+expr data

Sean Davis sdavis2 at mail.nih.gov
Tue Feb 6 19:35:59 CET 2007


On Tuesday 06 February 2007 12:57, Jeff Gentry wrote:
> Hello ...
>
> I'll put a disclaimer up front that I'm extremely new to the world of SNP
> data, so some of this might be overly naive.  I've been looking at the
> oligo package with an eye to put it to use to represent datasets involving
> SNP and expression data over the same samples.
>
> As background:  We have an application which currently has as its main
> purpose the ability to store multiple gene expression datasets in a
> database and then provides a web front end which allows users to select
> probes/samples across multiple datasets fairly quickly using generic
> queries ("all probes involved w/ the apoptosis pathway", "any sample that
> is ER+", etc).  The database is populated using ExpressionSet objects,
> which was also the model used for designing the DB tables.
>
> What we'd like to do now is to allow SNP data to reside in this databaseas
> well, and then provide ways to interact with it.  The dataset that started
> this ball rolling has both SNP and expression data for the same samples,
> and the investigators would like to be able to tie this information
> together - so beyond just having SNP and expression data supported, we'd
> also like to provide mechanisms for linking these.
>
> On the SNP side of the data, at least at the moment, we'd like to be able
> to represent Affy call information as well as copy number or an intensity
> value.
>
> To get the ball rolling, I took a look at the oligo package to get a sense
> for what containers it currently had, and how they worked.  There were
> four in particular that caught my eye:
>
> - The SnpCallSet:  Looks to essentially be an eSet object, but
> replacing the expression matrix with a matrix of the calls
>
> - The SnpCopyNumberSet: Same, but with copy #
>
> - oligoSnpSet: A container which would hold both calls & copy # (correct?)
>
> - SnpQSet: This I'm not sure what it represents, but is the output of the
> snprma() functionality
>
> For starters, was looking for confirmation that the above information is
> actually correct (or not) :)  After that, I'm looking to start moving
> towards some form of unified container -> it looks like this oligoSnpSet
> would hold the information we desire on the SNP side, and then perhaps a
> new class which contains both the ExpressionSet and the
> oligoSnpSet?  Other ideas on how to model this type of dataset?

I've thought about this a bit, but have never settled on a general framework 
for solving the problem.  The same issues come up with mapping between 
methylation data, chipchip data, snp data, CGH data, expression data, and 
others that most folks don't want to think about.  What I've come closest to 
settling on is a "mapper object" that sits between the two classes 
representing the different datatypes.  The "mapper object" gives a mapping 
between samples and features in the two classes, as they are likely to be 
many-to-many in general, particularly on the feature side.  This "mapper 
object" could be pretty simple, perhaps as simple as 2*(n-1) dataframes 
(where n is the number of mapped classes)--one set for mapping features and 
one for mapping samples, each based on the featureNames and sampleNames, 
respectively.  An initialize method would simply check the integrity of the 
supplied mappings against the supplied classes.  There would be some API 
issues to work out, particularly if there are more than 2 classes (snp, cgh, 
expression, for example) involved.  Thoughts?

Sean



More information about the Bioc-devel mailing list