[Bioc-devel] BioC 2.5: Added scanDates slot to Biobase's eSet class
Kasper Daniel Hansen
khansen at stat.berkeley.edu
Thu Jun 18 22:15:53 CEST 2009
I am adding my support to Laurent: I think scanDate is simply another
column in the phenotype info, indeed something I always put in, if I
have it available (well, actually I am usually more interested in prep
date). Putting in a new slot seems counter intuitive to me.
Kasper
On Jun 18, 2009, at 12:07 , Patrick Aboyoun wrote:
> Laurent,
> The scan dates were singled out originally because we have
> encountered data sets at the Hutch that appear to have a scan date
> effect and wanted a location to store this information so it can be
> included in the analysis. As you mentioned, there are other
> variables that could be important as well and shouldn't be ignored.
>
> Given that you have been actively working towards a solution of
> managing array metadata, you can help create a design that can be
> implemented in the Biobase package. Martin Morgan is currently
> leading this effort and we can start a dialog off-list (so as not to
> spam the rest of the developers with minutiae) with those who are
> interested to hammer out a solution to this problem. I think once
> the requirements are formally expressed, we can easily put together
> a design that meets the user's needs.
>
>
> Patrick
>
>
>
> Laurent Gautier wrote:
>>
>> Patrick,
>>
>> The conceptual distinction you want to make can be seen as
>> artificial.
>>
>> When you start introducing "arrayData" as a separated entity, you
>> will soon have to introduce "samplepreparationData" (what
>> extraction protocol was used, where there any biopsy, etc...),
>> "imageAnalysisData" (you know grid alignment, spot segmentation).
>> Is it reasonable to add a slot each time ? Moreover, those
>> categories can probably also be broken down into subcategories.
>> Finally, what is making the scanning date so important ?
>> Wouldn't the version of the software used, or the scanner, or the
>> scanner settings, or the name of the person who performed the
>> scanning be of relevance ?
>>
>> One route would be to construct an initial AnnotatedDataFrame and
>> populate it with whatever you fancy from the raw-data files (scan
>> date, software, etc...). I have been going way with my homebrew
>> infrastructure, and it has so far been leading to quite much
>> expressivity. Reserved words are not necessarily very limiting (if
>> sufficiently specific, say "array_scan_date" and the associated
>> varMetaData = "Date when scanning the hybridized microarray"), and
>> I'd think better to carefully design and document what is happening
>> when one is trying to add an other column with the same name rather
>> than rely on security-through-obscurity with mangled names.
>>
>>
>>
>> L.
>>
>>
>>
>>
>>
>> Patrick Aboyoun wrote:
>>> Laurent,
>>> As you mentioned the existing phenoData infrastructure could be
>>> used to house information like scan dates, scanner model, and
>>> scanning software version, but this information is not
>>> conceptually phenotype data and, and adding it to an
>>> AnnotatedDataFrame comes with the limitation of using reserved
>>> words (maybe name mangled like .__ScanDates__?) for column names
>>> in the AnnotatedDataFrame.
>>>
>>> The internal discussion we have been having to making this more
>>> general is to add a different slot (candidate name arrayData) to
>>> eSet (and removing the scanDates slot) that would house the type
>>> of information we have been discussing in a combination of
>>> dedicated slots like scanDates and a catch all AnnotatedDataFrame
>>> slot for less universal data. This design would separate the array
>>> data from the phenotype data and having dedicating slots for
>>> important information like scan dates would avoid having to manage
>>> special columns in an AnnotatedDataFrame.
>>>
>>> As you rightly point out we need to ensure we support a rich suite
>>> of functionality like "[", subset, etc., but this can all be
>>> handled through methods for the eSet class.
>>>
>>> Keep in mind that this recent change is just a first step, not a
>>> final design, and with your help and input from the rest of the
>>> BioC developer community, we can ensure we end up with a
>>> sufficiently useful microarray data infrastructure.
>>>
>>> Cheers,
>>> Patrick
>>>
>>>
>>> Quoting Laurent Gautier <laurent at cbs.dtu.dk>:
>>>
>>>> Patrick,
>>>>
>>>> There are indeed always several ways to address needs, and my
>>>> comment
>>>> is mostly pointing at the fact that creating yet-an-other slot is
>>>> not
>>>> necessary since one can currently store such data into phenoData
>>>> (into
>>>> a column named... say "scan_date").
>>>>
>>>> I would in fact qualify of overbuilding the approach that adds a
>>>> new
>>>> (and exclusive) slot while improving the exiting infrastructure
>>>> could
>>>> perfectly answer the needs. So today it's "scanDates", and next
>>>> could
>>>> be "scannerModel", or "scanningSoftwareVersion".
>>>>
>>>> I have been a little unclear (even to myself) in my comment about
>>>> using
>>>> "[", so here are more details. *If* the extract operator was made
>>>> to
>>>> evaluate expressions such as the function subset() does, or in
>>>> fact if
>>>> a method subset was implemented for eSet objects, storing all
>>>> information into phenoData makes such things nice:
>>>>
>>>> # silly example: only get the control data scanned in the future:
>>>> eset[, scan_date > date() & treatment == "control"]
>>>> # same with subset:
>>>> subset(eset, , scan_date > date() & treatment == "control")
>>>>
>>>> # a little longer to write
>>>> eset[, scanDates(eset) > date() & pData(eset) == "control"]
>>>>
>>>>
>>>> If for some reasons a distinction between phenoData and
>>>> like-phenoData-but-can't-be-the-same is needed, please do
>>>> consider the
>>>> creation of an AnnotatedDataFrame that contains all of them.
>>>>
>>>>
>>>>
>>>> L.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Patrick Aboyoun wrote:
>>>>> Laurent,
>>>>> We had some immediate need for scan date information and rather
>>>>> than overbuild a system for managing metadata that we may or
>>>>> may not need, we opted to start simply and then build up as
>>>>> appropriate. There has been some internal discussions about
>>>>> managing other metadata along with scan dates, but nothing else
>>>>> has bubbled to the top yet. Your thoughts and design can help
>>>>> speed up this process. The class versioning system in Biobase
>>>>> supports iterative development and we can make further changes
>>>>> once we lock a design in place. One editorial comment I have is
>>>>> that lots of designs are possible for a given need and, for
>>>>> example, the current class properly subsets the scanDates
>>>>> information using "[" despite not being stored in the phenoData
>>>>> (AnnotatedDataFrame) slot.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Patrick
>>>>>
>>>>>
>>>>> Quoting Laurent Gautier <laurent at cbs.dtu.dk>:
>>>>>
>>>>>> Hi Patrick,
>>>>>>
>>>>>> Storing the scan dates is indeed useful information, and is it
>>>>>> nice to
>>>>>> have it offered at the parsing stage.
>>>>>> However, first comment would be "does it justify a new slot" to
>>>>>> eSet ?
>>>>>>
>>>>>> I have been storing scan dates for quite some time now, but
>>>>>> opted for
>>>>>> having them in the phenoData as it made more sense to me, both
>>>>>> on an
>>>>>> implementation standpoint and on practical standpoint (as
>>>>>> standard
>>>>>> extraction of an eset-subset on columns with the "[" operator
>>>>>> works).
>>>>>>
>>>>>> If having something specific for scan dates is really really
>>>>>> wished,
>>>>>> would it make make sense to have that by extending
>>>>>> AnnotatedDataFrame ?
>>>>>>
>>>>>> In my opinion, the stage at which the the data are extracted
>>>>>> (in that
>>>>>> case when parsing the files coming out of the image analysis)
>>>>>> should
>>>>>> not dictate where the data are stored.
>>>>>> In fact, it might make it for a nice(r) workflow if the function
>>>>>> reading raw array data could return an eSet-inheriting instance
>>>>>> and a
>>>>>> phenoData with information such as dates and file names. I am
>>>>>> working
>>>>>> on a workflow that is in fact getting much more data from the
>>>>>> header (I
>>>>>> suppose that I'd contribute it when enough time to wrap it up).
>>>>>>
>>>>>>
>>>>>> Just few thoughts,
>>>>>>
>>>>>>
>>>>>>
>>>>>> L.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Patrick Aboyoun wrote:
>>>>>>> Dear Bioconductor developers,
>>>>>>> The Biocore group has just committed a change to the BioC 2.5
>>>>>>> code line (Biobase version 2.5.3) to support the use of
>>>>>>> microarray scan date in statistical analyses by adding a
>>>>>>> scanDates slot to Biobase's eSet class. This information can
>>>>>>> be retrieved and set using the new scanDates and
>>>>>>> scanDates<- function respectively. The scanDates slot is
>>>>>>> designed to hold a character vector of length = # of
>>>>>>> samples, with one character element for each sample. (See
>>>>>>> help(scanDates) for more information.)
>>>>>>>
>>>>>>> In this first round of check-ins we have added affy support
>>>>>>> of this new slot to functions like ReadAffy and we will be
>>>>>>> working towards adding this information to other microarray
>>>>>>> platforms as well.
>>>>>>>
>>>>>>> This change involved bumping the eSet version number from
>>>>>>> 1.1.0 to 1.2.0 in the Biobase class definition. In order to
>>>>>>> minimize the impact of this change, the Biobase methods
>>>>>>> support both the current eSet version 1.2.0 as well as old
>>>>>>> 1.1.0 serialized objects so updateObject will not be
>>>>>>> required to be performed on eSet-derived objects prior to
>>>>>>> use in other functions. We have also tested and versioned
>>>>>>> bumped (and patched where needed) the following packages
>>>>>>> that create eSet-derived classes to minimize any package
>>>>>>> build issues: ACME, beadarray, beadarraySNP, cellHTS2,
>>>>>>> CGHbase, codelink, crlmm, GeneRegionScan, GGBase, maDB,
>>>>>>> oligoClasses, ontoTools, puma, rMAT, SNPchip, and spkTools.
>>>>>>>
>>>>>>> Below is a demonstration of the new functionality. If you
>>>>>>> encounter any issues related to this change, please e-mail
>>>>>>> this list so the community can monitor the change.
>>>>>>>
>>>>>>> - The Biocore Team
>>>>>>>
>>>>>>>
>>>>>>>> suppressMessages(library(affy))
>>>>>>>> example(ReadAffy)
>>>>>>>
>>>>>>> RdAffy> if(require(affydata)){
>>>>>>> RdAffy+ celpath <- system.file("celfiles",
>>>>>>> package="affydata")
>>>>>>> RdAffy+ fns <- list.celfiles(path=celpath,full.names=TRUE)
>>>>>>> RdAffy+ RdAffy+ cat("Reading files:
>>>>>>> \n",paste(fns,collapse="\n"),"\n")
>>>>>>> RdAffy+ ##read a binary celfile
>>>>>>> RdAffy+ abatch <- ReadAffy(filenames=fns[1])
>>>>>>> RdAffy+ ##read a text celfile
>>>>>>> RdAffy+ abatch <- ReadAffy(filenames=fns[2])
>>>>>>> RdAffy+ ##read all files in that dir
>>>>>>> RdAffy+ abatch <- ReadAffy(celfile.path=celpath)
>>>>>>> RdAffy+ }
>>>>>>> Loading required package: affydata
>>>>>>> Reading files:
>>>>>>> /Library/Frameworks/R.framework/Versions/2.10/Resources/
>>>>>>> library/affydata/celfiles/binary.cel /Library/Frameworks/
>>>>>>> R.framework/Versions/2.10/Resources/library/affydata/celfiles/
>>>>>>> text.cel
>>>>>>>> scanDates(abatch)
>>>>>>> binary.cel text.cel
>>>>>>> "01/23/04 14:30:57" "08/29/03 15:12:30"
>>>>>>>> sessionInfo()
>>>>>>> R version 2.10.0 Under development (unstable) (2009-06-12
>>>>>>> r48755)
>>>>>>> i386-apple-darwin9.6.0
>>>>>>>
>>>>>>> locale:
>>>>>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats graphics grDevices utils datasets
>>>>>>> methods base
>>>>>>> other attached packages:
>>>>>>> [1] affydata_1.11.6 affy_1.23.2 Biobase_2.5.3
>>>>>>> loaded via a namespace (and not attached):
>>>>>>> [1] affyio_1.13.3 preprocessCore_1.7.4 tools_2.10.0
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at stat.math.ethz.ch mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list