[R] Why only a "" string for heading for row.names with write.csv with a matrix?
Tony Plate
tplate at acm.org
Wed Aug 10 19:42:08 CEST 2005
Here's a relatively easy way to get what I think you want. Note that
converting x to a data frame before cbind'ing allows the type of the
elements of x to be preserved:
> x <- matrix(1:6, 2,3)
> rownames(x) <- c("ID1", "ID2")
> colnames(x) <- c("Attr1", "Attr2", "Attr3")
> x
Attr1 Attr2 Attr3
ID1 1 3 5
ID2 2 4 6
> write.table(cbind(id=row.names(x), as.data.frame(x)),
row.names=FALSE, sep=",")
"id","Attr1","Attr2","Attr3"
"ID1",1,3,5
"ID2",2,4,6
>
As to why you can't get this via an argument to write.table (or
write.csv), I suspect that part of the answer is a wish to avoid
"creeping featuritis". Transferring data between programs is
notoriously infuriating. There are more data formats than there are
programs, but few programs use the same format as their default &
preferred format. So to accommodate everyone's preferred format would
require an extremely large number of features in the data import/export
functions. Maintaining software that contains a large number of
features is difficult -- it's easy for errors to creep in because there
are so many combinations of how different features can be used on
different functions.
The alternative to having lots of features on each function is to have a
relatively small set of powerful functions that can be used to construct
the behavior you want. This type of software is thought by many to be
easier to maintain and extend. I think is is pretty much the preferred
approach in R. The above one-liner for writing the data in the form you
want is really not much more complex than using an additional argument
to write.table(). (And if you need to do this kind of thing frequently,
then it's easy in R to create your own wrapper function for 'write.table'.)
One might object to this line of explanation by noting that many
functions already have many arguments and lots of features. I think the
situation is that the original author of any particular function gets to
decide what features the function will have, and after that there is
considerable reluctance (justifiably) to add new features, especially in
cases where there desired functionality can be easily achieved in other
ways with existing functions.
-- Tony Plate
Earl F. Glynn wrote:
> Consider:
>
>>x <- matrix(1:6, 2,3)
>>rownames(x) <- c("ID1", "ID2")
>>colnames(x) <- c("Attr1", "Attr2", "Attr3")
>
>
>>x
>
> Attr1 Attr2 Attr3
> ID1 1 3 5
> ID2 2 4 6
>
>
>>write.csv(x,file="x.csv")
>
> "","Attr1","Attr2","Attr3"
> "ID1",1,3,5
> "ID2",2,4,6
>
> Have I missed an easy way to get the "" string to be something meaningful?
>
> There is no information in the "" string. This column heading for the row
> names often could used as a database key, but the "" entry would need to be
> manually edited first. Why not provide a way to specify the string instead
> of putting "" as the heading for the rownames?
>
>>From http://finzi.psych.upenn.edu/R/doc/manual/R-data.html
>
> Header line
> R prefers the header line to have no entry for the row names,
> . . .
> Some other systems require a (possibly empty) entry for the row names,
> which is what write.table will provide if argument col.names = NA is
> specified. Excel is one such system.
>
> Why is an "empty" entry the only option here?
>
> A quick solution that comes to mind seems a bit kludgy:
>
>
>>y <- cbind(rownames(x), x)
>>colnames(y)[1] <- "ID"
>>y
>
> ID Attr1 Attr2 Attr3
> ID1 "ID1" "1" "3" "5"
> ID2 "ID2" "2" "4" "6"
>
>
>>write.table(y, row.names=F, col.names=T, sep=",", file="y.csv")
>
> "ID","Attr1","Attr2","Attr3"
> "ID1","1","3","5"
> "ID2","2","4","6"
>
> Now the rownames have an "ID" header, which could be used as a key in a
> database if desired without editing (but all the "numbers" are now
> characters strings, too).
>
> It's also not clear why I had to use write.table above, instead of
> write.csv:
>
>>write.csv(y, row.names=F, col.names=T, file="y.csv")
>
> Error in write.table(..., col.names = NA, sep = ",", qmethod = "double") :
> col.names = NA makes no sense when row.names = FALSE
>
> Thanks for any insight about this.
>
> efg
> --
> Earl F. Glynn
> Bioinformatics
> Stowers Institute
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list