[R] for loop over dataframe without indices

Gabor Grothendieck ggrothendieck at myway.com
Sat Dec 20 17:02:48 CET 2003




I think I've found a problem with the by approach.  Compare:

data(iris)
by( iris, row.names(iris), function(x)x )[1:5,]

to

iris[1:5,]

It seems by has reordered the rows.

 
Date: Fri, 19 Dec 2003 21:31:50 -0500 (EST) 
From: Gabor Grothendieck <ggrothendieck at myway.com>
To: <tlumley at u.washington.edu> 
Cc: <R-help at stat.math.ethz.ch> 
Subject: Re: [R] for loop over dataframe without indices 

 
 


Thomas, Thanks for your response. Its is quite nifty. 

Pursuing your solutions,
I think the objective should be to reproduce the output from 
t.data.frame defined as below (note that I posted a proposal
to change t.data.frame to r-devel before I received your reply):

t.data.frame <- function( df ) { 
ll <- NULL
for( i in 1:nrow(df) ) ll <- append( ll, list(df[i,]) )
ll 
}

Using the first 3 rows from the iris data set as our data frame,
run the following which shows that your "by" solution works provided
we nullify out the attributes afterwards. The do.call solution
does not appear to work, as required, since it turns the data 
frame into a matrix.

data(iris)
df <- iris[1:3,]

# Consider:

id <- function(x)x

# t.data.frame solution
zt <- t(df)

# by solution is good but it adds some junk attributes 
zby <- by( df, row.names(df), id )
identical(zt,zby) # FALSE

# nullifying these attributes seems to do it
zby2 <- zby
attributes(zby2) <- NULL
identical(zt,zby2) # TRUE

# do.call doesn't work right since it appears to turn the result into a matrix
str( do.call("mapply", list(id,df) ) ) # note matrix output


Here is the result of pasting the above into R 1.8.1 on Windows 2000:

> data(iris)
> df <- iris[1:3,]
> 
> # Consider:
> 
> id <- function(x)x
> 
> # t.data.frame solution
> zt <- t(df)
> 
> # by solution is good but it adds some junk attributes 
> zby <- by( df, row.names(df), id )
> identical(zt,zby)
[1] FALSE
> 
> # nullifying these attributes seems to do it
> zby2 <- zby
> attributes(zby2) <- NULL
> identical(zt,zby2)
[1] TRUE
> 
> # do.call doesn't work right since it appears to turn the result into a matrix
> str( do.call("mapply", list(id,df) ) )
num [1:3, 1:5] 5.1 4.9 4.7 3.5 3 3.2 1.4 1.4 1.3 0.2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : NULL
> 


Based on your solution I think the proposal should be changed
to:

t.data.frame <- function(df) {
z <- by( df, row.names(df), function(x)x )
attributes(z) <- NULL
z
}


---

Date: Fri, 19 Dec 2003 10:03:55 -0800 (PST) 
From: Thomas Lumley <tlumley at u.washington.edu>
To: Gabor Grothendieck <ggrothendieck at myway.com> 
Cc: <R-help at stat.math.ethz.ch> 
Subject: Re: [R] for loop over dataframe without indices 



On Fri, 19 Dec 2003, Gabor Grothendieck wrote:
>
> What I now realize is that the thing that is oddly
> missing in R is that you can't do an apply over
> the rows of a dataframe (at least not without having
> it coerced to an array and the elements coerced to
> possibly different types). The documentation does
> point this out. Its not a bug but its an omission
> that seems deserving of being addressed.
>

Since mapply() applies a function to each 'row' of a list of vectors, ou
can achieve this effect with
do.call("mapply", list(FUN,data.frame))
and also as a degenerate case of by():
by(data.frame, row.names(data.frame), FUN)

These should probably be documented under apply()


-thomas

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help




More information about the R-help mailing list