[R] two questions for R beginners

Karl Ove Hufthammer karl at huftis.org
Tue Mar 2 10:00:01 CET 2010


On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch <murdoch at stats.uwo.ca> 
wrote:
> Suppose X is a dataframe or a matrix.  What would you expect to get from 
> X[1]?  What about as.vector(X), or as.numeric(X)?

All this of course depends on type of object one is speaking of. There 
are plenty of surprises available, and it's best to use the most logical 
way of extracting. E.g., to extract the top-left element of a 2D 
structure (data frame or matrix), use 'X[1,1]'.

Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' 
on a data frame, just as if it was a matrix, even though the underlying 
structure is completely different. (This doesn't work on a normal list; 
there you have to type the whole 'X[[2]][3]'.)

The behaviour of the 'as.' functions may sometimes be surprising, at 
least for me. For example, 'as.data.frame' on a named vector gives a 
single-column data frame, instead of a single-row data frame.

(I'm not sure what's the recommended way of converting a named vector to 
row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
and 't(X)' looks like a row of numbers.)

> The point is that a dataframe is a list, and a matrix isn't.  If users 
> don't understand that, then they'll be confused somewhere.  Making 
> matrices more list-like in one respect will just move the confusion 
> elsewhere.  The solution is to understand the difference.

My main problem is not understanding the difference, which is easy, but 
knowing which type of I have when I get the output a function in a 
package. If I know the object is a named vector or a matrix with column 
names, it's easy enough to type 'X[,"colname"]', and if it's a data 
frame one may use the shortcut 'X$colname'.

Usually, it *is* documented what the return value of a function is, but 
just looking at the output is much faster, and *usually* gives the 
correct answer.

For example, 'mean' applied on a data frame gives a named vector, not a 
data frame, which is somewhat surprising (given that the columns of a 
data frame may be of different types, while the elements of a vector may 
not). (And yes, I know that it's *documented* that it returns a named 
vector.) On the other hand, perhaps it is surprising that 'mean' works 
on data frames at all. :-)

-- 
Karl Ove Hufthammer



More information about the R-help mailing list