[R] A regression problem using dummy variables

rlearner309 unixunix99 at gmail.com
Thu Jul 3 17:30:57 CEST 2008


sorry, made a stupid mistake.
I got it.
thanks a lot!

Peter Dalgaard wrote:
> 
> rlearner309 wrote:
>> I think it is zero, because you have lots of zeros there.  It is not like
>> continous variables.
>>
>>   
> Think again. The sum of products may be zero, but that is not the 
> covariance. And don't dismiss Thomas, he is usually right.
> 
> Anyways, the coefs of dummy variables represent differences to the same 
> base level, and chosing a poorly determined base level (essentially: 
> whose mean is determined by only a few observations) will cause high 
> parameter correlation. It should only affect those parameters though, 
> and it is not really clear what VIF means for dummy variables. One often 
> choses to relevel() to make the largest group the base level, but it 
> really comes down to which group contrasts you want to look at.
> 
> 
>>
>> Thomas Lumley wrote:
>>   
>>> On Wed, 2 Jul 2008, rlearner309 wrote:
>>>
>>>     
>>>> I think the covariance between dummy variables or between dummy
>>>> variables
>>>> and
>>>> intercept should always be zero.  meaning: no sigularity problem??
>>>>
>>>>       
>>> No.  You can easily check that this is not true using the cov()
>>> function.
>>> Indicator variables for mutually exclusive groups are negatively
>>> correlated.
>>>
>>>      -thomas
>>>
>>>
>>>
>>>     
>>>> rlearner309 wrote:
>>>>       
>>>>> This is actually more like a Statistics problem:
>>>>> I have a dataset with two dummy variables controlling three levels. 
>>>>> The
>>>>> problem is, one level does not have many observations compared with
>>>>> other
>>>>> two levels (a couple of data points compared with 1000+ points on
>>>>> other
>>>>> levels).  When I run the regression, the result is bad.  I have
>>>>> unbalanced
>>>>> SE and VIF.  Does this kind of problem also belong to "near
>>>>> sigularity"
>>>>> problem?  Does it make any difference if I code the level that lacks
>>>>> data
>>>>> (0,0) in stead of (0,1)?
>>>>>
>>>>> thanks a lot!
>>>>>
>>>>>         
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>       
>>> Thomas Lumley			Assoc. Professor, Biostatistics
>>> tlumley at u.washington.edu	University of Washington, Seattle
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>     
>>
>>   
> 
> 
> -- 
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18260470.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list