[R] proper way to process dataframe by rows
Duncan Murdoch
murdoch at stats.uwo.ca
Mon Nov 29 04:38:23 CET 2004
On Sun, 28 Nov 2004 21:25:24 -0500, Jack Tanner <ihok at hotmail.com>
wrote:
>This is a best practices / style question.
>
>The way I use RODBC is I something like this:
>
> > foo <- sqlQuery(db, "select * from foo")
> > apply(foo, 1, function{...})
>
>That is, I use apply to iterate over each result -- row -- in the
>RODBC-produced dataframe. Is this how one generally wants to do this?
>
>My concern is that when apply iterates over the rows, it uses
>as.matrix() to convert the dataframe to a character representation of
>itself. Thus my database's carefully planned data types (that RODBC
>carefully preserved when returning query results) get completely lost as
>I process the data. I've taken to judiciously sprinkling as.numeric()
>and friends here and there, but this is just begging for bugs.
>
>In other words, what is the smart way to process a dataframe by rows? Or
>is there, by chance, a specific technique or practice that is available
>for RODBC results but not for dataframes in general?
I would just use a for() loop if I didn't care about the speed too
much. If I did, I'd avoid dealing with rows of dataframes: access
using dataframe indexing is slow. Depending what your function is,
you're probably better off extracting the columns of the dataframe as
vectors, and working with those.
Duncan Murdoch
More information about the R-help
mailing list