[R] Regression Testing

Mojo mojo at sispyrc.com
Fri Jan 21 16:00:32 CET 2011


On 1/21/2011 9:13 AM, Achim Zeileis wrote:
> On Fri, 21 Jan 2011, Mojo wrote:
>
>> On 1/20/2011 4:42 PM, Achim Zeileis wrote:
>>> On Thu, 20 Jan 2011, Mojo wrote:
>>>
>>>> I'm new to R and some what new to the world of stats.  I got 
>>>> frustrated with excel and found R.  Enough of that already.
>>>>
>>>> I'm trying to test and correct for Heteroskedasticity
>>>>
>>>> I have data in a csv file that I load and store in a dataframe.
>>>>
>>>>> ds <- read.csv("book2.csv")
>>>>> df <- data.frame(ds)
>>>>
>>>> I then preform a OLS regression:
>>>>
>>>>> lmfit <- lm(df$y~df$x)
>>>
>>> Just btw: lm(y ~ x, data = df) is somewhat easier to read and also 
>>> easier to write when the formula involves more regressors.
>>>
>>>> To test for Heteroskedasticity, I run the BPtest:
>>>>
>>>>> bptest(lmfit)
>>>>
>>>>        studentized Breusch-Pagan test
>>>>
>>>> data:  lmfit
>>>> BP = 11.6768, df = 1, p-value = 0.0006329
>>>>
>>>> From the above, if I'm interpreting this correctly, there is 
>>>> Heteroskedasticity present.  To correct for this, I need to 
>>>> calculate robust error terms.
>>>
>>> That is one option. Another one would be using WLS instead of OLS - 
>>> or maybe FGLS. As the model just has one regressor, this might be 
>>> possible and result in a more efficient estimate than OLS.
>>
>> I thought that WLS (which I guessing is a weighted regression) is 
>> really only useful when you know or at least have an idea of what is 
>> causing the Heteroskedasticity?
>
> Yes. But with only a single variable that shouldn't be too hard to do. 
> Also in the Breusch-Pagan test you specify a hypothesized functional 
> form for the variance.
>
>> I'm not familiar with FGLS.
>
> There is a worked example in
>
>   demo("Ch-LinearRegression", package = "AER")
>
> The corresponding book has some more details.
>
> hth,
> Z
>
>> I plan on adding additional independent variables as I get more 
>> comfortable with everything.
>>
>>>
>>>> From my reading on this list, it seems like I need to vcovHC.
>>>
>>> That's another option, yes.
>>>
>>>>> vcovHC(lmfit)
>>>>              (Intercept)         df$x
>>>> (Intercept)  1.057460e-03 -4.961118e-05
>>>> df$x       -4.961118e-05  2.378465e-06
>>>>
>>>> I'm having a little bit of a hard time following the help pages.
>>>
>>> Yes, the manual page is somewhat technical but the first thing the 
>>> "Details" section does is: It points you to some references that 
>>> should be easier to read. I recommend starting with
>>>
>>>      Zeileis A (2004), Econometric Computing with HC and HAC Covariance
>>>      Matrix Estimators. _Journal of Statistical Software_, *11*(10),
>>>      1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.
>>
>> I will look into that.
>>
>> Thanks,
>> Mojo
>>
>>

If I were to use vcovHAC instead of vcovHC, does that correct for serial 
correlation as well as Heteroskedasticity?

Thanks,
Mojo



More information about the R-help mailing list