[R] Regression Testing
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Fri Jan 21 16:20:15 CET 2011
On Fri, 21 Jan 2011, Mojo wrote:
> On 1/21/2011 9:13 AM, Achim Zeileis wrote:
>> On Fri, 21 Jan 2011, Mojo wrote:
>>
>>> On 1/20/2011 4:42 PM, Achim Zeileis wrote:
>>>> On Thu, 20 Jan 2011, Mojo wrote:
>>>>
>>>>> I'm new to R and some what new to the world of stats. I got frustrated
>>>>> with excel and found R. Enough of that already.
>>>>>
>>>>> I'm trying to test and correct for Heteroskedasticity
>>>>>
>>>>> I have data in a csv file that I load and store in a dataframe.
>>>>>
>>>>>> ds <- read.csv("book2.csv")
>>>>>> df <- data.frame(ds)
>>>>>
>>>>> I then preform a OLS regression:
>>>>>
>>>>>> lmfit <- lm(df$y~df$x)
>>>>
>>>> Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier
>>>> to write when the formula involves more regressors.
>>>>
>>>>> To test for Heteroskedasticity, I run the BPtest:
>>>>>
>>>>>> bptest(lmfit)
>>>>>
>>>>> studentized Breusch-Pagan test
>>>>>
>>>>> data: lmfit
>>>>> BP = 11.6768, df = 1, p-value = 0.0006329
>>>>>
>>>>> From the above, if I'm interpreting this correctly, there is
>>>>> Heteroskedasticity present. To correct for this, I need to calculate
>>>>> robust error terms.
>>>>
>>>> That is one option. Another one would be using WLS instead of OLS - or
>>>> maybe FGLS. As the model just has one regressor, this might be possible
>>>> and result in a more efficient estimate than OLS.
>>>
>>> I thought that WLS (which I guessing is a weighted regression) is really
>>> only useful when you know or at least have an idea of what is causing the
>>> Heteroskedasticity?
>>
>> Yes. But with only a single variable that shouldn't be too hard to do. Also
>> in the Breusch-Pagan test you specify a hypothesized functional form for
>> the variance.
>>
>>> I'm not familiar with FGLS.
>>
>> There is a worked example in
>>
>> demo("Ch-LinearRegression", package = "AER")
>>
>> The corresponding book has some more details.
>>
>> hth,
>> Z
>>
>>> I plan on adding additional independent variables as I get more
>>> comfortable with everything.
>>>
>>>>
>>>>> From my reading on this list, it seems like I need to vcovHC.
>>>>
>>>> That's another option, yes.
>>>>
>>>>>> vcovHC(lmfit)
>>>>> (Intercept) df$x
>>>>> (Intercept) 1.057460e-03 -4.961118e-05
>>>>> df$x -4.961118e-05 2.378465e-06
>>>>>
>>>>> I'm having a little bit of a hard time following the help pages.
>>>>
>>>> Yes, the manual page is somewhat technical but the first thing the
>>>> "Details" section does is: It points you to some references that should
>>>> be easier to read. I recommend starting with
>>>>
>>>> Zeileis A (2004), Econometric Computing with HC and HAC Covariance
>>>> Matrix Estimators. _Journal of Statistical Software_, *11*(10),
>>>> 1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.
>>>
>>> I will look into that.
>>>
>>> Thanks,
>>> Mojo
>>>
>>>
>
> If I were to use vcovHAC instead of vcovHC, does that correct for serial
> correlation as well as Heteroskedasticity?
Yes, as the name (HAC = Heteroskedasticity and Autocorrelation Consistent)
conveys. But for details please read the papers that accompany the
software package and the original references cited therein.
Z
> Thanks,
> Mojo
>
More information about the R-help
mailing list