[Bioc-devel] read.AnnotatedDataFrame

Florian Hahne f.hahne at dkfz-heidelberg.de
Tue Jan 9 16:44:35 CET 2007


Hi Seth,
internal representation is one part of the story and I agree that row
names are the way to go here. Another point however is how the user gets
the information into R. At some point we need to match sample names and
the sample meta data and IMO this should already be at the level of the
text file. The closest to the row names idea is probably to take the
first column in the file as the sample identifier, but this poses a
pretty strict layout on the files (maybe for some users the first column
is already the row numbering...). As far as I understand the current
implementation the default is to take the first column and that you can
pass row.names=x to read.AnnotatedDataFrame but there is this additional
sampleNames parameter and I find this pretty confusing. So currently you
can do almost everything with the function which is good in one sense
but on the other hand might cause mix ups and confusion to the user. If
the mapping is already clear at the level of the text file, we can sit
back and tell people to check their files if something isn't showing up
as they expect it to be, but currently you can do pretty stupid stuff
just by setting a wrong argument without even realizing.
I had the impression at the Bressanone courses that for the average user
the biggest obstacle is to get all the necessary data from files
somewhere on the hard disk into R and that it is important to provide a
straightforward default way of doing that.
Best,
Florian


Seth Falcon schrieb:
> Florian Hahne <f.hahne at dkfz-heidelberg.de> writes:
>   
>> I'm not sure about having the sample names as row.names, though. I think
>> there used to be a mandatory column "name" to store them, which I
>> personally liked better (in many spreadsheet programs the concept of row
>> names is somewhat vague...) . 
>>     
>
> Interesting.  The row names are special since they must often be aligned with
> other object and can be used for subsetting.  
>
> I have no problem with a "name" column being recognized by an import
> tool (aside from issues of name collisions -- what if I have a
> variable named "name").  But I think performance concerns will move us
> away from having such a column in the actual representation of the
> object.
>
> What we are moving towards is a setup where the row names are stored
> in a separate slot and may eventually be an external vector that can
> be shared among other objects that need to align on that vector.
>
> + seth
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>   


-- 
Florian Hahne
Abt. Molekulare Genomanalyse (B050)
Deutsches Krebsforschungszentrum (DKFZ)
Im Neuenheimer Feld 580
D-69120 Heidelberg
phone: 0049 6221 424764
fax: 0049 6221 423454
web: www.dkfz.de/mga



More information about the Bioc-devel mailing list