[R] AFTREG with ID argument

Thu Feb 18 14:25:10 CET 2010

Göran, David,

in order to adapt aftreg to my needs I wrote a little function that 
I would like to share with you and the community.

WHAT DOES IT FIX?

(1) Using the id-argument in combination with missing values on 
covariates wasn't easily possible before because the id-dataframe 
and the data-dataframe had different sizes and aftreg quitted with 
an error. My fix makes sure that NAs are excluded from both 
dataframes and aftreg will run without error here.

(2) The id-argument was required to be specified by its "absolute 
path" (eg. id=testdata$groupvar, see below in this thread). My 
adapted funtion takes the name of the id-variable as a string, eg. 
id="groupvar".

HOW DOES IT WORK?

Use function aftreg2 just like you would use aftreg. Mandatory 
arguments are: formula, data and id, where id is a string variable. 
Example:

 > testdata
   start stop censor groupvar      var1
1     0    1      0        1 0.1284928
2     1    2      0        1 0.4896125
3     2    3      0        1 0.7012899
4     3    4      0        1        NA
5     0    1      0        2 0.7964361
6     1    2      0        2 0.8466039
7     2    3      1        2 0.2234271

model1 <- aftreg(Surv(start, stop, censor)~var1, data=testdata, 
id=groupvar)
 > ERROR.

model2 <- aftreg2(Surv(start, stop, censor)~var1, data=testdata, 
id="groupvar")
 > WORKS FINE.

PREREQUISITES:

(1) Make sure that missing values are only present at the end of a 
lifetime. The regression will yield false results if you have 
missing covariate data in the middle of a lifetime. For instance: 
known covariates from liftetime 0-10, 13-20, but not from 11-12. 
(Göran: Please correct me if I'm wrong here!).

(2) If you have missing covariate data at the beginning of a 
lifetime (eg. missing from 0-5, but present from 6-censoring), this 
fix will yield false results if one _cannot_ assume that the missing 
covariates were the same from 0-5 as they were at 6. (Göran: Please 
correct me again if I'm wrong here with my interpretation, but 
that's basically what you said before)

LISTING:

aftreg2 <- function(formula, data, id, ...){

	call <- match.call()

	non_na_cols <- attr(attr(terms(formula), "factors"), 
"dimnames")[2][[1]]

	data <- data[complete.cases(data[non_na_cols]),]
	data <- data[complete.cases(data[id]),]

	cat("Original Call: ")
	print(call)

	return(aftreg(formula=formula, data=data, id=data[,id], ...))	
}

Hope someone finds this interesting.

All the best
Philipp

David Winsemius wrote:
> 
> On Feb 11, 2010, at 5:58 AM, Philipp Rappold wrote:
> 
>> Göran, thanks!
>>
>> One more thing that I found: As soon as you have at least one NA in 
>> the independent vars, the trick that you mentioned does not work 
>> anymore. Example:
>>
>> > testdata
>>  start stop censor groupvar      var1
>> 1     0    1      0        1 0.1284928
>> 2     1    2      0        1 0.4896125
>> 3     2    3      0        1 0.7012899
>> 4     3    4      0        1        NA
>> 5     0    1      0        2 0.7964361
>> 6     1    2      0        2 0.8466039
>> 7     2    3      1        2 0.2234271
>>
>> > aftreg(Surv(start, stop, censor)~var1, data=testdata, 
>> id=testdata$groupvar)
>> Error in order(id, Y[, 1]) : Different length of arguments (* I 
>> translated this from the German Output *)
>>
>> Do you think there is a simple hack which excludes all subjects that 
>> have at least on NA in their independent vars? If it was only one 
>> dependent var it would probably be easy by just using subset, but I 
>> have lots of different combinations of vars that I'd like to test ;)
>>
> 
> I don't know if it's a "hack", but there are a set of functions that 
> perform such subsetting:
> 
> ?na.omit
> 
> There is a parameter that would accomplish that goal inside aftreg. You 
> may want to check what your defaults are for na.action.
>