[Bioc-devel] oligo package, SNP+expr data

Jeff Gentry jgentry at jimmy.harvard.edu
Tue Feb 6 18:57:56 CET 2007

Hello ...

I'll put a disclaimer up front that I'm extremely new to the world of SNP
data, so some of this might be overly naive.  I've been looking at the
oligo package with an eye to put it to use to represent datasets involving
SNP and expression data over the same samples.

As background:  We have an application which currently has as its main
purpose the ability to store multiple gene expression datasets in a
database and then provides a web front end which allows users to select
probes/samples across multiple datasets fairly quickly using generic
queries ("all probes involved w/ the apoptosis pathway", "any sample that
is ER+", etc).  The database is populated using ExpressionSet objects,
which was also the model used for designing the DB tables.

What we'd like to do now is to allow SNP data to reside in this databaseas
well, and then provide ways to interact with it.  The dataset that started
this ball rolling has both SNP and expression data for the same samples,
and the investigators would like to be able to tie this information
together - so beyond just having SNP and expression data supported, we'd
also like to provide mechanisms for linking these.  

On the SNP side of the data, at least at the moment, we'd like to be able
to represent Affy call information as well as copy number or an intensity

To get the ball rolling, I took a look at the oligo package to get a sense
for what containers it currently had, and how they worked.  There were
four in particular that caught my eye:

- The SnpCallSet:  Looks to essentially be an eSet object, but 
replacing the expression matrix with a matrix of the calls

- The SnpCopyNumberSet: Same, but with copy #

- oligoSnpSet: A container which would hold both calls & copy # (correct?)

- SnpQSet: This I'm not sure what it represents, but is the output of the
snprma() functionality

For starters, was looking for confirmation that the above information is
actually correct (or not) :)  After that, I'm looking to start moving
towards some form of unified container -> it looks like this oligoSnpSet
would hold the information we desire on the SNP side, and then perhaps a
new class which contains both the ExpressionSet and the
oligoSnpSet?  Other ideas on how to model this type of dataset?


More information about the Bioc-devel mailing list