[Bioc-devel] read.AnnotatedDataFrame

Florian Hahne f.hahne at dkfz-heidelberg.de
Tue Jan 9 12:39:34 CET 2007


Hi Martin,
I stumbled across this because I formerly used the phenoData class and
now switched to AnnotatedDataFrames in objects of class cytoSet in my
prada package and also now in the flowSets of the new flowCore package.
So for me the problem is unrelated to affy. I cannot speak for all the
ExpressionSet users, but in my use cases I usually have a data frame (or
some table in a file) with all the necessary meta data for each sample.
I guess reading in such files is the most common way people get the info
into their data structures, in the end nobody wants to build a data
frame of possibly hundreds of rows interactively in R or via a widget.
I'm not sure about having the sample names as row.names, though. I think
there used to be a mandatory column "name" to store them, which I
personally liked better (in many spreadsheet programs the concept of row
names is somewhat vague...) . It might be helpful to improve the
documentation for read.AnnotatedDataFrame a bit, maybe adding an example
file so people can see how this is supposed to look like. Apart from
that it might be hard to make this procedure easier/more robust since
use cases and also the background/expertise of users differ a lot.
Hope these thoughts helped a bit,
Florian

Martin Morgan schrieb:
> Thanks Florian -- oddly, Crispin Miller sent email earlier today about
> this same issue; it's fixed in R-devel. 
>
> read.AnnotatedDataFrame was introduced to accommodate modifications to
> affy; is this (affy) where the problem came from? I'm not really sure
> how people get info into ExpressionSets, and would be happy to make
> that process easier / more robust.
>
> Martin
>
> Florian Hahne <f.hahne at dkfz-heidelberg.de> writes:
>
>   
>> Hi Martin, all
>>
>> I tried to adopt the read.AnnotatedDataFrame method on files that I was
>> able to import with read.phenoData before and got the following error
>> message:
>>
>> Error in data.frame(labelDescription = varLabels, row.names =
>> names(varLabels)) :
>>         row names supplied are of the wrong length
>>
>> After taking a look at the code and changing the line
>>     varLabels <- as.list(rep("read from file", ncol(pData)))
>> to
>>     varLabels <- rep("read from file", ncol(pData))
>> the function created the AnnotatedDataFrame
>>
>> Not sure if this is a bug or if my phenoData files should be formated in
>> another way, but I strongly doubt that the original version will work in
>> any case, since the implicit coercion in the following
>> varMetadata = data.frame(labelDescription = varLabels,
>>                 row.names = names(varLabels))))
>> creates a data frame with 1 row and length(varLabels) columns, hence the
>> row.names=names(varLabels) argument will cause an error.
>>
>> Am i wrong here?
>> Florian
>>
>>
>>  sessionInfo()
>> R version 2.5.0 Under development (unstable) (2006-05-24 r38188)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> LC_CTYPE=de_DE.ISO-8859-1;LC_NUMERIC=C;LC_TIME=de_DE.ISO-8859-1;LC_COLLATE=de_DE.ISO-8859-1;LC_MONETARY=de_DE.ISO-8859-1;LC_MESSAGES=de_DE.ISO-8859-1;LC_PAPER=de_DE.ISO-8859-1;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.ISO-8859-1;LC_IDENTIFICATION=C
>>
>> attached base packages:
>>  [1] "splines"   "grid"      "tools"     "methods"   "stats"     "graphics"
>>  [7] "grDevices" "utils"     "datasets"  "base"
>>
>> other attached packages:
>>   genefilter     survival    rnaiUtils      flomisc        RODBC       
>> prada
>>     "1.13.5"       "2.29"        "1.0"      "1.0.2"      "1.1-7"    
>> "1.11.3"
>> RColorBrewer      Biobase
>>      "0.2-3"    "1.13.29"
>>
>>
>>
>> -- 
>> Florian Hahne
>> Abt. Molekulare Genomanalyse (B050)
>> Deutsches Krebsforschungszentrum (DKFZ)
>> Im Neuenheimer Feld 580
>> D-69120 Heidelberg
>> phone: 0049 6221 424764
>> fax: 0049 6221 423454
>> web: www.dkfz.de/mga
>>
>>     
>
>   


-- 
Florian Hahne
Abt. Molekulare Genomanalyse (B050)
Deutsches Krebsforschungszentrum (DKFZ)
Im Neuenheimer Feld 580
D-69120 Heidelberg
phone: 0049 6221 424764
fax: 0049 6221 423454
web: www.dkfz.de/mga



More information about the Bioc-devel mailing list