[R] Automating R for Hypothesis Testing
meredith
mmballar at mtu.edu
Wed May 16 15:34:50 CEST 2012
Rui-
Just a quick question. I understand your comment on using ANOVA, but
doesn't this only test for similarities of the mean. We are trying to see if
a the same model can fit for two or three months, therefore have the similar
slope and intercept. The ANOVA would only do one part of this correct, with
the F-Test?
Thanks!
Meredith
Rui Barradas wrote
>
> Hello,
>
> I'm glad it helped.
>
> As for your second question, I don't know, but I'm not very comfortable
> with the way you're doing things.
> Why subtract the coefficients of model 1 from model 2?
> And why the dummy? Why set model 1 to zero?
>
> Isn't it better to use anova's F? After all, it's designed for it, for the
> linear model...
> And if you really want/need the dummy, wouldn't a nested anova do it? (F
> statistic, once again.)
>
> anova(model1, model2)
>
> is simple and statistically speaking seems to me much better. (I specially
> don't like the subtraction bit.)
>
> Rui Barradas
>
> meredith wrote
>>
>> Rui-
>> Thanks this definitely helps, just one quick question. How would you
>> code the values of chi-fm and chi-fms to change based on the degrees of
>> freedom of each model H(i)?
>>
>> Meredith
>>
>>
>> Rui Barradas wrote
>>>
>>> Hello,
>>>
>>> Yes, it does help. Now we can see your data and what you're doing.
>>> What follows is a suggestion on what you could do, not full solution.
>>> (You forgot to say what X1 is, but I don't think it's important to
>>> understand the suggestion.)
>>> (If I'm wrong, say something.)
>>>
>>>
>>> milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE,
>>> stringsAsFactors=FALSE)
>>> # list of data.frames, one per month
>>> ls1 <- split(milwaukeephos, milwaukeephos$month)
>>>
>>> #--------- if you want to keep the models, not needed if you don't.
>>> # (yoy probably don't)
>>> modelH <- vector("list", 12)
>>> modelHa <- vector("list", 12)
>>> modelH2 <- vector("list", 12)
>>> modelH2a <- vector("list", 12)
>>> #--------- values to record, these are needed, create them beforehand.
>>> chi_fm <- numeric(12)
>>> chi_fms <- numeric(12)
>>> #
>>> seq_months <- c(1:12, 1) # wrap months around.
>>> for(i in 1:12){
>>> month_this <- seq_months[i]
>>> month_next <- seq_months[i + 1]
>>>
>>> lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
>>> lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
>>> modelH[[i]] <- lm(lload ~ lflow)
>>> # If you don't want to keep the models, use modelH only
>>> # ( without [[i]] )
>>> # and do the same with X1
>>>
>>> # rest of your code for first test goes here
>>> chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)
>>>
>>> # and the same for the second test
>>> chi_fms[i] <- ...etc...
>>> }
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>>
>>> meredith wrote
>>>>
>>>> dput: http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv
>>>> milwaukeephos.csv
>>>>
>>>> # Feb-march
>>>>> modelH_febmarch<-lm(llfeb_march~lffeb_march)
>>>>>modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
>>>>> anova(modelHa_febmarch)
>>>>> coefficients(modelH_febmarch)
>>>> (Intercept) lffeb_march
>>>> -2.429890 1.172821
>>>>> coefficients(modelHa_febmarch)
>>>> (Intercept) X1feb_mar lffeb_march
>>>> -2.8957776 -0.5272793 1.3016303
>>>>> bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
>>>>> bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
>>>>>bfm<-t(bunres_fm-bres_fm)
>>>>> fmvect<-seq(1,1,length=34)
>>>>> X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
>>>>> X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
>>>>> X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation
>>>> # Test Stat Equation for Chisq
>>>>> fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
>>>>> tfmx<-t(fmxx)
>>>>> xcom_fm<-(tfmx %*% fmxx)
>>>>> xinv_fm<-ginv(xcom_fm)
>>>>> var_fm<-xinv_fm*0.307
>>>>> chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
>>>>> chi_fm # chisq value for recording
>>>> if less than CV move onto to slope modification
>>>>> modelH2_febmarch<-lm(llfeb_march~X3feb_march)
>>>>> modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
>>>>> anova(modelH2a_febmarch)
>>>>> coefficients(modelH2_febmarch) # get coefficients to make beta vectors
>>>>> for test
>>>> (Intercept) X3feb_march
>>>> 5.342130 1.172821
>>>>> coefficients(modelH2a_febmarch)
>>>> (Intercept) X3feb_march X4feb_march
>>>> 5.2936263 1.0353752 0.2407557
>>>> # Test Stat
>>>>> bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
>>>>> bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
>>>>> bsfm<-t(bsunres_fm-bsres_fm)
>>>>> #X matrix
>>>>> fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
>>>>> tfmxs<-t(fmxs)
>>>>> xcoms_fm<-(tfmxs %*% fmxs)
>>>>> xinvs_fm<-ginv(xcoms_fm)
>>>>> var_fms<-xinvs_fm*0.341
>>>>> chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
>>>>> chi_fms
>>>> # Record Chisq value
>>>>
>>>> Does this help?
>>>> Here lffeb_march is the combination of Feb and March log flows
>>>> and llfeb_march is the combination of Feb and March log loads
>>>> X3: lffeb_march-mean(feb_march)
>>>> X4: X1*X3
>>>>
>>>> Thanks
>>>>
>>>> Rui Barradas wrote
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm not at all sure if I understand your problem. Does this describe
>>>>> it?
>>>>>
>>>>>
>>>>> test first model for months 1 and 2
>>>>> if test statistic less than critical value{
>>>>> test second model for months 1 and 2
>>>>> print results of the first and second tests? just one of them?
>>>>> }
>>>>> move on to months 2 and 3
>>>>> etc, until months 12 and 1
>>>>>
>>>>>
>>>>> Please post example data using dput(dataset).
>>>>> Just copy it's output and paste it in your post.
>>>>>
>>>>> And example code, what you're already doing.
>>>>> (Possibly simplified)
>>>>>
>>>>> Rui Barradas
>>>>>
>>>>>
>>>>> meredith wrote
>>>>>>
>>>>>> R Users-
>>>>>> I have been trying to automate a manual code that I have developed
>>>>>> for calling in a .csv file, isolating certain rows and columns that
>>>>>> correspond to specified months:
>>>>>> something to the effect
>>>>>> i=name.csv
>>>>>> N=length(i$month)
>>>>>> iphos1=0
>>>>>> iphos2=0
>>>>>> isphos3=0
>>>>>> for i=1,N
>>>>>> if month=1
>>>>>> iphos1=iphos+1
>>>>>> iphos1(iphos1)=i
>>>>>>
>>>>>> an so on to call out the months into there own arrays (unless there
>>>>>> is a way I can wrap it into the next automation)
>>>>>>
>>>>>> Next: I would like to run a simple linear regression combining each
>>>>>> of the months 1 by 1:
>>>>>> for instance I want to run a regression on a combined model from
>>>>>> months 1 and 2 and a dummy model for 1 and 2, compare them using a
>>>>>> Chi-sq distribution, if Chi-sq is less than the Critical value, we
>>>>>> accept and go on to test another set of models with both 1 and 2. If
>>>>>> it rejects, then we proceed to months 2 and 3. If we move on to the
>>>>>> second set on months 1 and 2, and the critical value is accepted, I
>>>>>> want to print an accept or reject and move on to months 2 and 3,
>>>>>> until finally comparing months 12-1 at the end.
>>>>>> Is there a way to loop or automate this in R?
>>>>>>
>>>>>> Thanks
>>>>>> Meredith
>>>>>>
>>>>>
>>>>
>>>> Rui Barradas wrote
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm not at all sure if I understand your problem. Does this describe
>>>>> it?
>>>>>
>>>>>
>>>>> test first model for months 1 and 2
>>>>> if test statistic less than critical value{
>>>>> test second model for months 1 and 2
>>>>> print results of the first and second tests? just one of them?
>>>>> }
>>>>> move on to months 2 and 3
>>>>> etc, until months 12 and 1
>>>>>
>>>>>
>>>>> Please post example data using dput(dataset).
>>>>> Just copy it's output and paste it in your post.
>>>>>
>>>>> And example code, what you're already doing.
>>>>> (Possibly simplified)
>>>>>
>>>>> Rui Barradas
>>>>>
>>>>>
>>>>> meredith wrote
>>>>>>
>>>>>> R Users-
>>>>>> I have been trying to automate a manual code that I have developed
>>>>>> for calling in a .csv file, isolating certain rows and columns that
>>>>>> correspond to specified months:
>>>>>> something to the effect
>>>>>> i=name.csv
>>>>>> N=length(i$month)
>>>>>> iphos1=0
>>>>>> iphos2=0
>>>>>> isphos3=0
>>>>>> for i=1,N
>>>>>> if month=1
>>>>>> iphos1=iphos+1
>>>>>> iphos1(iphos1)=i
>>>>>>
>>>>>> an so on to call out the months into there own arrays (unless there
>>>>>> is a way I can wrap it into the next automation)
>>>>>>
>>>>>> Next: I would like to run a simple linear regression combining each
>>>>>> of the months 1 by 1:
>>>>>> for instance I want to run a regression on a combined model from
>>>>>> months 1 and 2 and a dummy model for 1 and 2, compare them using a
>>>>>> Chi-sq distribution, if Chi-sq is less than the Critical value, we
>>>>>> accept and go on to test another set of models with both 1 and 2. If
>>>>>> it rejects, then we proceed to months 2 and 3. If we move on to the
>>>>>> second set on months 1 and 2, and the critical value is accepted, I
>>>>>> want to print an accept or reject and move on to months 2 and 3,
>>>>>> until finally comparing months 12-1 at the end.
>>>>>> Is there a way to loop or automate this in R?
>>>>>>
>>>>>> Thanks
>>>>>> Meredith
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653p4630243.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list