[R] A regression problem using dummy variables
rlearner309
unixunix99 at gmail.com
Thu Jul 3 17:30:57 CEST 2008
sorry, made a stupid mistake.
I got it.
thanks a lot!
Peter Dalgaard wrote:
>
> rlearner309 wrote:
>> I think it is zero, because you have lots of zeros there. It is not like
>> continous variables.
>>
>>
> Think again. The sum of products may be zero, but that is not the
> covariance. And don't dismiss Thomas, he is usually right.
>
> Anyways, the coefs of dummy variables represent differences to the same
> base level, and chosing a poorly determined base level (essentially:
> whose mean is determined by only a few observations) will cause high
> parameter correlation. It should only affect those parameters though,
> and it is not really clear what VIF means for dummy variables. One often
> choses to relevel() to make the largest group the base level, but it
> really comes down to which group contrasts you want to look at.
>
>
>>
>> Thomas Lumley wrote:
>>
>>> On Wed, 2 Jul 2008, rlearner309 wrote:
>>>
>>>
>>>> I think the covariance between dummy variables or between dummy
>>>> variables
>>>> and
>>>> intercept should always be zero. meaning: no sigularity problem??
>>>>
>>>>
>>> No. You can easily check that this is not true using the cov()
>>> function.
>>> Indicator variables for mutually exclusive groups are negatively
>>> correlated.
>>>
>>> -thomas
>>>
>>>
>>>
>>>
>>>> rlearner309 wrote:
>>>>
>>>>> This is actually more like a Statistics problem:
>>>>> I have a dataset with two dummy variables controlling three levels.
>>>>> The
>>>>> problem is, one level does not have many observations compared with
>>>>> other
>>>>> two levels (a couple of data points compared with 1000+ points on
>>>>> other
>>>>> levels). When I run the regression, the result is bad. I have
>>>>> unbalanced
>>>>> SE and VIF. Does this kind of problem also belong to "near
>>>>> sigularity"
>>>>> problem? Does it make any difference if I code the level that lacks
>>>>> data
>>>>> (0,0) in stead of (0,1)?
>>>>>
>>>>> thanks a lot!
>>>>>
>>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>> Thomas Lumley Assoc. Professor, Biostatistics
>>> tlumley at u.washington.edu University of Washington, Seattle
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>
>>
>
>
> --
> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18260470.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list