[R] predict: remove columns with new levels automatically

Peter Ehlers ehlers at ucalgary.ca
Wed Nov 25 10:11:50 CET 2009


Andreas Wittmann wrote:
> Sorry for my bad description, i don't want get a constructed algorithm without own work. i only hoped to get some advice how to do this. i don't want to predict any sort of data, i reference only to newdata which variables are the same as in the model data. But if factors in the data than i can by possibly that the newdata has a level which doesn't exist in the original data.
> So i have to compare each factor in the data and in the newdata and if the newdata has a levels which is not in the original data and drop this variable and do compute the model and prediction again. 
> I thought this problem is quite common and i can use an algorithm somebody has already implemented.
> 
> best regards
> 
> Andreas
> 
If I understand correctly, you want to build a model that
includes at least one factor predictor (say xf with k levels).
Then you want to use this model to predict a response value
when xf takes a _new_ level about which the model knows
nothing. That doesn't make sense to me, so I doubt that
it's a common problem. Introducing a new level for a factor
variable is just like introducing a new variable.

  -Peter Ehlers

> 
> 
> 
> -------- Original-Nachricht --------
>> Datum: Wed, 25 Nov 2009 00:48:59 -0500
>> Von: David Winsemius <dwinsemius at comcast.net>
>> An: Andreas Wittmann <andreas_wittmann at gmx.de>
>> CC: r-help at r-project.org
>> Betreff: Re: [R] predict: remove columns with new levels automatically
> 
>> On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:
>>
>>> Dear R-users,
>>>
>>> in the follwing thread
>>>
>>> http://tolstoy.newcastle.edu.au/R/help/03b/3322.html
>>>
>>> the problem how to remove rows for predict that contain levels which  
>>> are not in the model.
>>>
>>> now i try to do this the other way round and want to remove columns  
>>> (variables) in the model which will be later problematic with new  
>>> levels for prediction.
>>>
>>> ## example:
>>> set.seed(0)
>>> x <- rnorm(9)
>>> y <- x + rnorm(9)
>>>
>>> training <- data.frame(x=x, y=y, z=c(rep("A", 3), rep("B", 3),  
>>> rep("C", 3)))
>>> test <- data.frame(x=t<-rnorm(1), y=t+rnorm(1), z="D")
>>>
>>> lm1 <- lm(x ~ ., data=training)
>>> ## prediction does not work because the variable z has the new level  
>>> "D"
>>> predict(lm1, test)
>>>
>>> ## solution: the variable z is removed from the model
>>> ## the prediction happens without using the information of variable z
>>> lm2 <- lm(x ~ y, data=training)
>>> predict(lm2, test)
>>>
>>> How can i autmatically recognice this and calculate according to this?
>> Let me get this straight. You want us to predict in advance (or more  
>> accurately design an algorithm that can see into the future and work  
>> around) any sort of newdata you might later construct????
>>
>> --
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>




More information about the R-help mailing list