[R] Regression Testing

Achim Zeileis Achim.Zeileis at uibk.ac.at
Fri Jan 21 16:20:15 CET 2011


On Fri, 21 Jan 2011, Mojo wrote:

> On 1/21/2011 9:13 AM, Achim Zeileis wrote:
>> On Fri, 21 Jan 2011, Mojo wrote:
>> 
>>> On 1/20/2011 4:42 PM, Achim Zeileis wrote:
>>>> On Thu, 20 Jan 2011, Mojo wrote:
>>>> 
>>>>> I'm new to R and some what new to the world of stats.  I got frustrated 
>>>>> with excel and found R.  Enough of that already.
>>>>> 
>>>>> I'm trying to test and correct for Heteroskedasticity
>>>>> 
>>>>> I have data in a csv file that I load and store in a dataframe.
>>>>> 
>>>>>> ds <- read.csv("book2.csv")
>>>>>> df <- data.frame(ds)
>>>>> 
>>>>> I then preform a OLS regression:
>>>>> 
>>>>>> lmfit <- lm(df$y~df$x)
>>>> 
>>>> Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier 
>>>> to write when the formula involves more regressors.
>>>> 
>>>>> To test for Heteroskedasticity, I run the BPtest:
>>>>> 
>>>>>> bptest(lmfit)
>>>>>
>>>>>        studentized Breusch-Pagan test
>>>>> 
>>>>> data:  lmfit
>>>>> BP = 11.6768, df = 1, p-value = 0.0006329
>>>>> 
>>>>> From the above, if I'm interpreting this correctly, there is 
>>>>> Heteroskedasticity present.  To correct for this, I need to calculate 
>>>>> robust error terms.
>>>> 
>>>> That is one option. Another one would be using WLS instead of OLS - or 
>>>> maybe FGLS. As the model just has one regressor, this might be possible 
>>>> and result in a more efficient estimate than OLS.
>>> 
>>> I thought that WLS (which I guessing is a weighted regression) is really 
>>> only useful when you know or at least have an idea of what is causing the 
>>> Heteroskedasticity?
>> 
>> Yes. But with only a single variable that shouldn't be too hard to do. Also 
>> in the Breusch-Pagan test you specify a hypothesized functional form for 
>> the variance.
>> 
>>> I'm not familiar with FGLS.
>> 
>> There is a worked example in
>>
>>   demo("Ch-LinearRegression", package = "AER")
>> 
>> The corresponding book has some more details.
>> 
>> hth,
>> Z
>> 
>>> I plan on adding additional independent variables as I get more 
>>> comfortable with everything.
>>> 
>>>> 
>>>>> From my reading on this list, it seems like I need to vcovHC.
>>>> 
>>>> That's another option, yes.
>>>> 
>>>>>> vcovHC(lmfit)
>>>>>              (Intercept)         df$x
>>>>> (Intercept)  1.057460e-03 -4.961118e-05
>>>>> df$x       -4.961118e-05  2.378465e-06
>>>>> 
>>>>> I'm having a little bit of a hard time following the help pages.
>>>> 
>>>> Yes, the manual page is somewhat technical but the first thing the 
>>>> "Details" section does is: It points you to some references that should 
>>>> be easier to read. I recommend starting with
>>>>
>>>>      Zeileis A (2004), Econometric Computing with HC and HAC Covariance
>>>>      Matrix Estimators. _Journal of Statistical Software_, *11*(10),
>>>>      1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.
>>> 
>>> I will look into that.
>>> 
>>> Thanks,
>>> Mojo
>>> 
>>> 
>
> If I were to use vcovHAC instead of vcovHC, does that correct for serial 
> correlation as well as Heteroskedasticity?

Yes, as the name (HAC = Heteroskedasticity and Autocorrelation Consistent) 
conveys. But for details please read the papers that accompany the 
software package and the original references cited therein.
Z

> Thanks,
> Mojo
>



More information about the R-help mailing list