[R] how to ignore rows missing arguments of a function when creating a function?
Joris Meys
jorismeys at gmail.com
Wed Jun 9 11:33:11 CEST 2010
It's difficult to help you if we don't know what the data looks like.
Two more tips :
- look at ?with instead of attach(). The latter causes a lot of
trouble further on.
- use require() instead of library(). See also the help files on this.
I also wonder what you're doing with the na.rm=TRUE. I don't see how
it affects the function, as it is used nowhere. Does it really remove
NA?
This said: estfun takes the estimation function from your model
object. Your model is essentially fit on all observations that are not
NA, whereas I presume your cluster contains all observations,
including NA. Now I don't know what kind of model fm represents, but
normally there is information in the model object about the removed
observations. I illustrate this using lm :
> x <- rnorm(100)
> y <- c(rnorm(90),NA,rnorm(9))
> test <- lm(x~y)
> str(test)
List of 13
... (tons of information)
$ model :'data.frame': 99 obs. of 2 variables:
... (more tons of information)
..- attr(*, "na.action")=Class 'omit' Named int 91
.. .. ..- attr(*, "names")= chr "91"
- attr(*, "class")= chr "lm"
Now we know that we can do :
> not <-attr(test$model,"na.action")
> y[-not]
So try this
### NOT TESTED
cl <- function(dat, na.rm = TRUE, fm, cluster){
require(sandwich)
require(lmtest)
not <- attr(fm$model,"na.action")
cluster <- cluster[-not]
with( dat ,{
M <- length(unique(cluster))
N <- length(cluster)
K <- fm$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,
cluster, sum)) ) );
vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
coeftest(fm, vcovCL)
} )
}
Cheers
Joris
On Wed, Jun 9, 2010 at 12:06 AM, edmund jones <edmund.j.jones at gmail.com> wrote:
> Hi,
>
> I am relatively new to R; when creating functions, I run into problems with
> missing values. I would like my functions to ignore rows with missing values
> for arguments of my function) in the analysis (as for example is the case in
> STATA). Note that I don't want my function to drop rows if there are missing
> arguments elsewhere in a row, ie for variables that are not arguments of my
> function.
>
> As an example: here is a clustering function I wrote:
>
>
> cl <- function(dat, na.rm = TRUE, fm, cluster){
>
> attach( dat , warn.conflicts = F)
>
> library(sandwich)
>
> library(lmtest)
>
> M <- length(unique(cluster))
>
> N <- length(cluster)
>
> K <- fm$rank
>
> dfc <- (M/(M-1))*((N-1)/(N-K))
>
> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,
> cluster, sum)) ) );
>
> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
>
> coeftest(fm, vcovCL)
>
> }
>
>
> When I run my function, I get the message:
>
>
> Error in tapply(x, cluster, sum) : arguments must have same length
>
>
> If I specify instead attach(na.omit(dat), warn.conflicts = F) and don't
> have the "na.rm=TRUE" argument, then my function runs; but only for the rows
> where there are no missing values AT ALL; however, I don't care if there are
> missing values for variables on which I am not applying my function.
>
>
> For example, I have information on children's size; if I want regress scores
> on age and parents' education, clustering on class, I would like missing
> values in size not to interfere (ie if I have scores, age, parents'
> education, and class, but not size, I don't want to drop this observation).
>
>
> I tried to look at the code of "lm" to see how the na.action part works, but
> I couldn't figure it out... This is exactly how I would like to deal with
> missing values.
>
>
> I tried to write
>
> cl <- function(dat, fm, cluster, na.action){
>
> attach( dat , warn.conflicts = F)
>
> library(sandwich)
>
> library(lmtest)
>
> M <- length(unique(cluster))
>
> N <- length(cluster)
>
> K <- fm$rank
>
> dfc <- (M/(M-1))*((N-1)/(N-K))
>
> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,
> cluster, sum)) ) );
>
> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
>
> coeftest(fm, vcovCL)
>
> }
>
> attr(cl,"na.action") <- na.exclude
>
>
> but it still didn't work...
>
>
> Any ideas of how to deal with this issue?
>
> Thank you for your answers!
>
> Edmund
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
More information about the R-help
mailing list