[Bioc-devel] BioC 2.5: Added scanDates slot to Biobase's eSet class

Patrick Aboyoun paboyoun at fhcrc.org
Thu Jun 18 17:44:47 CEST 2009


Laurent,
As you mentioned the existing phenoData infrastructure could be used  
to house information like scan dates, scanner model, and scanning  
software version, but this information is not conceptually phenotype  
data and, and adding it to an AnnotatedDataFrame comes with the  
limitation of using reserved words (maybe name mangled like  
.__ScanDates__?) for column names in the AnnotatedDataFrame.

The internal discussion we have been having to making this more  
general is to add a different slot (candidate name arrayData) to eSet  
(and removing the scanDates slot) that would house the type of  
information we have been discussing in a combination of dedicated  
slots like scanDates and a catch all AnnotatedDataFrame slot for less  
universal data. This design would separate the array data from the  
phenotype data and having dedicating slots for important information  
like scan dates would avoid having to manage special columns in an  
AnnotatedDataFrame.

As you rightly point out we need to ensure we support a rich suite of  
functionality like "[", subset, etc., but this can all be handled  
through methods for the eSet class.

Keep in mind that this recent change is just a first step, not a final  
design, and with your help and input from the rest of the BioC  
developer community, we can ensure we end up with a sufficiently  
useful microarray data infrastructure.

Cheers,
Patrick


Quoting Laurent Gautier <laurent at cbs.dtu.dk>:

> Patrick,
>
> There are indeed always several ways to address needs, and my comment
> is mostly pointing at the fact that creating yet-an-other slot is not
> necessary since one can currently store such data into phenoData (into
> a column named... say "scan_date").
>
> I would in fact qualify of overbuilding the approach that adds a new
> (and exclusive) slot while improving the exiting infrastructure could
> perfectly answer the needs. So today it's "scanDates", and next could
> be "scannerModel", or "scanningSoftwareVersion".
>
> I have been a little unclear (even to myself) in my comment about using
> "[", so here are more details. *If* the extract operator was made to
> evaluate expressions such as the function subset() does, or in fact if
> a method subset was implemented for eSet objects, storing all
> information into phenoData makes such things nice:
>
> # silly example: only get the control data scanned in the future:
> eset[, scan_date > date() & treatment == "control"]
> # same with subset:
> subset(eset, , scan_date > date() & treatment == "control")
>
> # a little longer to write
> eset[, scanDates(eset) > date() & pData(eset) == "control"]
>
>
> If for some reasons a distinction between phenoData and
> like-phenoData-but-can't-be-the-same is needed, please do consider the
> creation of an AnnotatedDataFrame that contains all of them.
>
>
>
> L.
>
>
>
>
>
> Patrick Aboyoun wrote:
>> Laurent,
>> We had some immediate need for scan date information and rather   
>> than overbuild a system for managing metadata that we may or may   
>> not need, we opted to start simply and then build up as   
>> appropriate. There has been some internal discussions about   
>> managing other metadata along with scan dates, but nothing else has  
>>  bubbled to the top yet. Your thoughts and design can help speed up  
>>  this process. The class versioning system in Biobase supports   
>> iterative development and we can make further changes once we lock   
>> a design in place. One editorial comment I have is that lots of   
>> designs are possible for a given need and, for example, the current  
>>  class properly subsets the scanDates information using "[" despite  
>>  not being stored in the phenoData (AnnotatedDataFrame) slot.
>>
>>
>> Cheers,
>> Patrick
>>
>>
>> Quoting Laurent Gautier <laurent at cbs.dtu.dk>:
>>
>>> Hi Patrick,
>>>
>>> Storing the scan dates is indeed useful information, and is it nice to
>>> have it offered at the parsing stage.
>>> However, first comment would be "does it justify a new slot" to eSet ?
>>>
>>> I have been storing scan dates for quite some time now, but opted for
>>> having them in the phenoData as it made more sense to me, both on an
>>> implementation standpoint and on practical standpoint (as standard
>>> extraction of an eset-subset on columns with the "[" operator works).
>>>
>>> If having something specific for scan dates is really really wished,
>>> would it make make sense to have that by extending AnnotatedDataFrame ?
>>>
>>> In my opinion, the stage at which the the data are extracted (in that
>>> case when parsing the files coming out of the image analysis) should
>>> not dictate where the data are stored.
>>> In fact, it might make it for a nice(r) workflow if the function
>>> reading raw array data could return an eSet-inheriting instance and a
>>> phenoData with information such as dates and file names. I am working
>>> on a workflow that is in fact getting much more data from the header (I
>>> suppose that I'd contribute it when enough time to wrap it up).
>>>
>>>
>>> Just few thoughts,
>>>
>>>
>>>
>>> L.
>>>
>>>
>>>
>>>
>>>
>>> Patrick Aboyoun wrote:
>>>> Dear Bioconductor developers,
>>>> The Biocore group has just committed a change to the BioC 2.5   
>>>> code  line (Biobase version 2.5.3) to support the use of   
>>>> microarray scan  date in statistical analyses by adding a   
>>>> scanDates slot to  Biobase's eSet class. This information can be   
>>>> retrieved and set  using the new scanDates and scanDates<-   
>>>> function respectively. The  scanDates slot is designed to hold a   
>>>> character vector of length = #  of samples, with one character   
>>>> element for each sample. (See  help(scanDates) for more   
>>>> information.)
>>>>
>>>> In this first round of check-ins we have added affy support of   
>>>> this  new slot to functions like ReadAffy and we will be working   
>>>> towards  adding this information to other microarray platforms as  
>>>>  well.
>>>>
>>>> This change involved bumping the eSet version number from 1.1.0   
>>>> to  1.2.0 in the Biobase class definition. In order to minimize   
>>>> the  impact of this change, the Biobase methods support both the   
>>>> current  eSet version 1.2.0 as well as old 1.1.0 serialized   
>>>> objects so  updateObject will not be required to be performed on   
>>>> eSet-derived  objects prior to use in other functions. We have   
>>>> also tested and  versioned bumped (and patched where needed) the   
>>>> following packages  that create eSet-derived classes to minimize   
>>>> any package build  issues: ACME, beadarray, beadarraySNP,   
>>>> cellHTS2, CGHbase, codelink,  crlmm, GeneRegionScan, GGBase,   
>>>> maDB, oligoClasses, ontoTools, puma,  rMAT, SNPchip, and spkTools.
>>>>
>>>> Below is a demonstration of the new functionality. If you   
>>>> encounter  any issues related to this change, please e-mail this   
>>>> list so the  community can monitor the change.
>>>>
>>>> - The Biocore Team
>>>>
>>>>
>>>>> suppressMessages(library(affy))
>>>>> example(ReadAffy)
>>>>
>>>> RdAffy> if(require(affydata)){
>>>> RdAffy+      celpath <- system.file("celfiles", package="affydata")
>>>> RdAffy+      fns <- list.celfiles(path=celpath,full.names=TRUE)
>>>> RdAffy+  RdAffy+      cat("Reading   
>>>> files:\n",paste(fns,collapse="\n"),"\n")
>>>> RdAffy+      ##read a binary celfile
>>>> RdAffy+      abatch <- ReadAffy(filenames=fns[1])
>>>> RdAffy+      ##read a text celfile
>>>> RdAffy+      abatch <- ReadAffy(filenames=fns[2])
>>>> RdAffy+      ##read all files in that dir
>>>> RdAffy+      abatch <- ReadAffy(celfile.path=celpath)
>>>> RdAffy+ }
>>>> Loading required package: affydata
>>>> Reading files:
>>>> /Library/Frameworks/R.framework/Versions/2.10/Resources/library/affydata/celfiles/binary.cel    
>>>> /Library/Frameworks/R.framework/Versions/2.10/Resources/library/affydata/celfiles/text.cel
>>>>> scanDates(abatch)
>>>>       binary.cel            text.cel
>>>> "01/23/04 14:30:57" "08/29/03 15:12:30"
>>>>> sessionInfo()
>>>> R version 2.10.0 Under development (unstable) (2009-06-12 r48755)
>>>> i386-apple-darwin9.6.0
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>> other attached packages:
>>>> [1] affydata_1.11.6 affy_1.23.2     Biobase_2.5.3
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.13.3        preprocessCore_1.7.4 tools_2.10.0
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>



More information about the Bioc-devel mailing list