[R] AFTREG with ID argument
Philipp Rappold
philipp.rappold at gmail.com
Thu Feb 18 14:25:10 CET 2010
Göran, David,
in order to adapt aftreg to my needs I wrote a little function that
I would like to share with you and the community.
WHAT DOES IT FIX?
(1) Using the id-argument in combination with missing values on
covariates wasn't easily possible before because the id-dataframe
and the data-dataframe had different sizes and aftreg quitted with
an error. My fix makes sure that NAs are excluded from both
dataframes and aftreg will run without error here.
(2) The id-argument was required to be specified by its "absolute
path" (eg. id=testdata$groupvar, see below in this thread). My
adapted funtion takes the name of the id-variable as a string, eg.
id="groupvar".
HOW DOES IT WORK?
Use function aftreg2 just like you would use aftreg. Mandatory
arguments are: formula, data and id, where id is a string variable.
Example:
> testdata
start stop censor groupvar var1
1 0 1 0 1 0.1284928
2 1 2 0 1 0.4896125
3 2 3 0 1 0.7012899
4 3 4 0 1 NA
5 0 1 0 2 0.7964361
6 1 2 0 2 0.8466039
7 2 3 1 2 0.2234271
model1 <- aftreg(Surv(start, stop, censor)~var1, data=testdata,
id=groupvar)
> ERROR.
model2 <- aftreg2(Surv(start, stop, censor)~var1, data=testdata,
id="groupvar")
> WORKS FINE.
PREREQUISITES:
(1) Make sure that missing values are only present at the end of a
lifetime. The regression will yield false results if you have
missing covariate data in the middle of a lifetime. For instance:
known covariates from liftetime 0-10, 13-20, but not from 11-12.
(Göran: Please correct me if I'm wrong here!).
(2) If you have missing covariate data at the beginning of a
lifetime (eg. missing from 0-5, but present from 6-censoring), this
fix will yield false results if one _cannot_ assume that the missing
covariates were the same from 0-5 as they were at 6. (Göran: Please
correct me again if I'm wrong here with my interpretation, but
that's basically what you said before)
LISTING:
aftreg2 <- function(formula, data, id, ...){
call <- match.call()
non_na_cols <- attr(attr(terms(formula), "factors"),
"dimnames")[2][[1]]
data <- data[complete.cases(data[non_na_cols]),]
data <- data[complete.cases(data[id]),]
cat("Original Call: ")
print(call)
return(aftreg(formula=formula, data=data, id=data[,id], ...))
}
Hope someone finds this interesting.
All the best
Philipp
David Winsemius wrote:
>
> On Feb 11, 2010, at 5:58 AM, Philipp Rappold wrote:
>
>> Göran, thanks!
>>
>> One more thing that I found: As soon as you have at least one NA in
>> the independent vars, the trick that you mentioned does not work
>> anymore. Example:
>>
>> > testdata
>> start stop censor groupvar var1
>> 1 0 1 0 1 0.1284928
>> 2 1 2 0 1 0.4896125
>> 3 2 3 0 1 0.7012899
>> 4 3 4 0 1 NA
>> 5 0 1 0 2 0.7964361
>> 6 1 2 0 2 0.8466039
>> 7 2 3 1 2 0.2234271
>>
>> > aftreg(Surv(start, stop, censor)~var1, data=testdata,
>> id=testdata$groupvar)
>> Error in order(id, Y[, 1]) : Different length of arguments (* I
>> translated this from the German Output *)
>>
>> Do you think there is a simple hack which excludes all subjects that
>> have at least on NA in their independent vars? If it was only one
>> dependent var it would probably be easy by just using subset, but I
>> have lots of different combinations of vars that I'd like to test ;)
>>
>
> I don't know if it's a "hack", but there are a set of functions that
> perform such subsetting:
>
> ?na.omit
>
> There is a parameter that would accomplish that goal inside aftreg. You
> may want to check what your defaults are for na.action.
>
More information about the R-help
mailing list