[R] Why only a "" string for heading for row.names with write.csv with a matrix?

Tony Plate tplate at acm.org
Wed Aug 10 19:42:08 CEST 2005


Here's a relatively easy way to get what I think you want.  Note that 
converting x to a data frame before cbind'ing allows the type of the 
elements of x to be preserved:

 > x <- matrix(1:6, 2,3)
 > rownames(x) <- c("ID1", "ID2")
 > colnames(x) <- c("Attr1", "Attr2", "Attr3")
 > x
     Attr1 Attr2 Attr3
ID1     1     3     5
ID2     2     4     6
 > write.table(cbind(id=row.names(x), as.data.frame(x)), 
row.names=FALSE, sep=",")
"id","Attr1","Attr2","Attr3"
"ID1",1,3,5
"ID2",2,4,6
 >

As to why you can't get this via an argument to write.table (or 
write.csv), I suspect that part of the answer is a wish to avoid 
"creeping featuritis".  Transferring data between programs is 
notoriously infuriating.  There are more data formats than there are 
programs, but few programs use the same format as their default & 
preferred format.  So to accommodate everyone's preferred format would 
require an extremely large number of features in the data import/export 
functions.  Maintaining software that contains a large number of 
features is difficult -- it's easy for errors to creep in because there 
are so many combinations of how different features can be used on 
different functions.

The alternative to having lots of features on each function is to have a 
relatively small set of powerful functions that can be used to construct 
the behavior you want.  This type of software is thought by many to be 
easier to maintain and extend.  I think is is pretty much the preferred 
approach in R.  The above one-liner for writing the data in the form you 
want is really not much more complex than using an additional argument 
to write.table().  (And if you need to do this kind of thing frequently, 
then it's easy in R to create your own wrapper function for 'write.table'.)

One might object to this line of explanation by noting that many 
functions already have many arguments and lots of features.  I think the 
situation is that the original author of any particular function gets to 
decide what features the function will have, and after that there is 
considerable reluctance (justifiably) to add new features, especially in 
cases where there desired functionality can be easily achieved in other 
ways with existing functions.

-- Tony Plate

Earl F. Glynn wrote:
> Consider:
> 
>>x <- matrix(1:6, 2,3)
>>rownames(x) <- c("ID1", "ID2")
>>colnames(x) <- c("Attr1", "Attr2", "Attr3")
> 
> 
>>x
> 
>     Attr1 Attr2 Attr3
> ID1     1     3     5
> ID2     2     4     6
> 
> 
>>write.csv(x,file="x.csv")
> 
> "","Attr1","Attr2","Attr3"
> "ID1",1,3,5
> "ID2",2,4,6
> 
> Have I missed an easy way to get the "" string to be something meaningful?
> 
> There is no information in the "" string.  This column heading for the row
> names often could used as a database key, but the "" entry would need to be
> manually edited first.  Why not provide a way to specify the string instead
> of putting "" as the heading for the rownames?
> 
>>From http://finzi.psych.upenn.edu/R/doc/manual/R-data.html
> 
>   Header line
>   R prefers the header line to have no entry for the row names,
>   . . .
>   Some other systems require a (possibly empty) entry for the row names,
> which is what write.table will provide if argument col.names = NA  is
> specified. Excel is one such system.
> 
> Why is an "empty" entry the only option here?
> 
> A quick solution that comes to mind seems a bit kludgy:
> 
> 
>>y <- cbind(rownames(x), x)
>>colnames(y)[1] <- "ID"
>>y
> 
>     ID    Attr1 Attr2 Attr3
> ID1 "ID1" "1"   "3"   "5"
> ID2 "ID2" "2"   "4"   "6"
> 
> 
>>write.table(y, row.names=F, col.names=T, sep=",", file="y.csv")
> 
> "ID","Attr1","Attr2","Attr3"
> "ID1","1","3","5"
> "ID2","2","4","6"
> 
> Now the rownames have an "ID" header, which could be used as a key in a
> database if desired without editing (but all the "numbers" are now
> characters strings, too).
> 
> It's also not clear why I had to use write.table above, instead of
> write.csv:
> 
>>write.csv(y, row.names=F, col.names=T, file="y.csv")
> 
> Error in write.table(..., col.names = NA, sep = ",", qmethod = "double") :
>         col.names = NA makes no sense when row.names = FALSE
> 
> Thanks for any insight about this.
> 
> efg
> --
> Earl F. Glynn
> Bioinformatics
> Stowers Institute
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list