[R] Automating R for Hypothesis Testing

Wed May 16 15:34:50 CEST 2012

Rui-
  Just a quick question. I understand your comment on using ANOVA, but
doesn't this only test for similarities of the mean. We are trying to see if
a the same model can fit for two or three months, therefore have the similar
slope and intercept. The ANOVA would only do one part of this correct, with
the F-Test?

Thanks!
Meredith

Rui Barradas wrote
> 
> Hello,
> 
> I'm glad it helped.
> 
> As for your second question, I don't know, but I'm not very comfortable
> with the way you're doing things.
> Why subtract the coefficients of model 1 from model 2?
> And why the dummy? Why set model 1 to zero?
> 
> Isn't it better to use anova's F? After all, it's designed for it, for the
> linear model...
> And if you really want/need the dummy, wouldn't a nested anova do it? (F
> statistic, once again.)
> 
> anova(model1, model2)
> 
> is simple and statistically speaking seems to me much better. (I specially
> don't like the subtraction bit.)
> 
> Rui Barradas
> 
> meredith wrote
>> 
>> Rui-
>>   Thanks this definitely helps, just one quick question. How would you
>> code the values of chi-fm and chi-fms to change based on the degrees of
>> freedom of each model H(i)?
>> 
>> Meredith
>> 
>> 
>> Rui Barradas wrote
>>> 
>>> Hello,
>>> 
>>> Yes, it does help. Now we can see your data and what you're doing.
>>> What follows is a suggestion on what you could do, not full solution.
>>> (You forgot to say what X1 is, but I don't think it's important to
>>> understand the suggestion.)
>>> (If I'm wrong, say something.)
>>> 
>>> 
>>> milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE,
>>> stringsAsFactors=FALSE)
>>> # list of data.frames, one per month
>>> ls1 <- split(milwaukeephos, milwaukeephos$month)
>>> 
>>> #--------- if you want to keep the models, not needed if you don't.
>>> #          (yoy probably don't)
>>> modelH <- vector("list", 12)
>>> modelHa <- vector("list", 12)
>>> modelH2 <- vector("list", 12)
>>> modelH2a <- vector("list", 12)
>>> #--------- values to record, these are needed, create them beforehand.
>>> chi_fm <- numeric(12)
>>> chi_fms <- numeric(12)
>>> #
>>> seq_months <- c(1:12, 1) # wrap months around.
>>> for(i in 1:12){
>>> 	month_this <- seq_months[i]
>>> 	month_next <- seq_months[i + 1]
>>> 
>>> 	lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
>>> 	lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
>>> 	modelH[[i]] <- lm(lload ~ lflow)
>>> 	# If you don't want to keep the models, use modelH only
>>> 	# ( without [[i]] )
>>> 	# and do the same with X1
>>> 
>>> 	# rest of your code for first test goes here
>>> 	chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)
>>> 
>>> 	# and the same for the second test
>>> 	chi_fms[i] <- ...etc...
>>> }
>>> 
>>> 
>>> Hope this helps,
>>> 
>>> Rui Barradas
>>> 
>>> 
>>> meredith wrote
>>>> 
>>>> dput:  http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv
>>>> milwaukeephos.csv 
>>>> 
>>>> # Feb-march
>>>>> modelH_febmarch<-lm(llfeb_march~lffeb_march)
>>>>>modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
>>>>> anova(modelHa_febmarch)
>>>>> coefficients(modelH_febmarch)
>>>> (Intercept) lffeb_march 
>>>>   -2.429890    1.172821 
>>>>> coefficients(modelHa_febmarch)
>>>> (Intercept)   X1feb_mar lffeb_march 
>>>>  -2.8957776  -0.5272793   1.3016303 
>>>>> bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
>>>>> bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
>>>>>bfm<-t(bunres_fm-bres_fm)
>>>>> fmvect<-seq(1,1,length=34)
>>>>> X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
>>>>> X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
>>>>> X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation
>>>> # Test Stat Equation for Chisq
>>>>> fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
>>>>> tfmx<-t(fmxx)
>>>>> xcom_fm<-(tfmx %*% fmxx)
>>>>> xinv_fm<-ginv(xcom_fm)
>>>>> var_fm<-xinv_fm*0.307
>>>>> chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
>>>>> chi_fm # chisq value for recording
>>>> if less than CV move onto to slope modification
>>>>> modelH2_febmarch<-lm(llfeb_march~X3feb_march)
>>>>> modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
>>>>> anova(modelH2a_febmarch)
>>>>> coefficients(modelH2_febmarch) # get coefficients to make beta vectors
>>>>> for test
>>>> (Intercept) X3feb_march 
>>>>    5.342130    1.172821 
>>>>> coefficients(modelH2a_febmarch)
>>>> (Intercept) X3feb_march X4feb_march 
>>>>   5.2936263   1.0353752   0.2407557 
>>>> # Test Stat
>>>>> bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
>>>>> bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
>>>>> bsfm<-t(bsunres_fm-bsres_fm)
>>>>> #X matrix
>>>>> fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
>>>>> tfmxs<-t(fmxs)
>>>>> xcoms_fm<-(tfmxs %*% fmxs)
>>>>> xinvs_fm<-ginv(xcoms_fm)
>>>>> var_fms<-xinvs_fm*0.341
>>>>> chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
>>>>> chi_fms
>>>> # Record Chisq value
>>>> 
>>>> Does this help?
>>>> Here lffeb_march is the combination of Feb and March log flows
>>>> and llfeb_march is the combination of Feb and March log loads
>>>> X3: lffeb_march-mean(feb_march)
>>>> X4: X1*X3
>>>> 
>>>> Thanks
>>>> 
>>>> Rui Barradas wrote
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I'm not at all sure if I understand your problem. Does this describe
>>>>> it?
>>>>> 
>>>>> 
>>>>> test first model for months 1 and 2
>>>>> if test statistic less than critical value{
>>>>> 	test second model for months 1 and 2
>>>>> 	print results of the first and second tests? just one of them?
>>>>> }
>>>>> move on to months 2 and 3
>>>>> etc, until months 12 and 1
>>>>> 
>>>>> 
>>>>> Please post example data using dput(dataset).
>>>>> Just copy it's output and paste it in your post.
>>>>> 
>>>>> And example code, what you're already doing.
>>>>> (Possibly simplified)
>>>>> 
>>>>> Rui Barradas
>>>>> 
>>>>> 
>>>>> meredith wrote
>>>>>> 
>>>>>> R Users-
>>>>>>   I have been trying to automate a manual code that I have developed
>>>>>> for calling in a .csv file, isolating certain rows and columns that
>>>>>> correspond to specified months:
>>>>>> something to the effect
>>>>>> i=name.csv
>>>>>> N=length(i$month)
>>>>>> iphos1=0
>>>>>> iphos2=0
>>>>>> isphos3=0
>>>>>> for i=1,N
>>>>>>  if month=1
>>>>>>     iphos1=iphos+1
>>>>>>     iphos1(iphos1)=i
>>>>>> 
>>>>>> an so on to call out the months into there own arrays (unless there
>>>>>> is a way I can wrap it into the next automation)
>>>>>> 
>>>>>> Next: I would like to run a simple linear regression combining each
>>>>>> of the months 1 by 1:
>>>>>> for instance I want to run a regression on a combined model from
>>>>>> months 1 and 2 and a dummy model for 1 and 2, compare them using a
>>>>>> Chi-sq distribution, if Chi-sq is less than the Critical value, we
>>>>>> accept and go on to test another set of models with both 1 and 2. If
>>>>>> it rejects, then we proceed to months 2 and 3.  If we move on to the
>>>>>> second set on months 1 and 2, and the critical value is accepted, I
>>>>>> want to print an accept or reject and move on to months 2 and 3,
>>>>>> until finally comparing months 12-1 at the end.
>>>>>> Is there a way to loop or automate this in R?
>>>>>> 
>>>>>> Thanks
>>>>>> Meredith
>>>>>> 
>>>>> 
>>>> 
>>>> Rui Barradas wrote
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I'm not at all sure if I understand your problem. Does this describe
>>>>> it?
>>>>> 
>>>>> 
>>>>> test first model for months 1 and 2
>>>>> if test statistic less than critical value{
>>>>> 	test second model for months 1 and 2
>>>>> 	print results of the first and second tests? just one of them?
>>>>> }
>>>>> move on to months 2 and 3
>>>>> etc, until months 12 and 1
>>>>> 
>>>>> 
>>>>> Please post example data using dput(dataset).
>>>>> Just copy it's output and paste it in your post.
>>>>> 
>>>>> And example code, what you're already doing.
>>>>> (Possibly simplified)
>>>>> 
>>>>> Rui Barradas
>>>>> 
>>>>> 
>>>>> meredith wrote
>>>>>> 
>>>>>> R Users-
>>>>>>   I have been trying to automate a manual code that I have developed
>>>>>> for calling in a .csv file, isolating certain rows and columns that
>>>>>> correspond to specified months:
>>>>>> something to the effect
>>>>>> i=name.csv
>>>>>> N=length(i$month)
>>>>>> iphos1=0
>>>>>> iphos2=0
>>>>>> isphos3=0
>>>>>> for i=1,N
>>>>>>  if month=1
>>>>>>     iphos1=iphos+1
>>>>>>     iphos1(iphos1)=i
>>>>>> 
>>>>>> an so on to call out the months into there own arrays (unless there
>>>>>> is a way I can wrap it into the next automation)
>>>>>> 
>>>>>> Next: I would like to run a simple linear regression combining each
>>>>>> of the months 1 by 1:
>>>>>> for instance I want to run a regression on a combined model from
>>>>>> months 1 and 2 and a dummy model for 1 and 2, compare them using a
>>>>>> Chi-sq distribution, if Chi-sq is less than the Critical value, we
>>>>>> accept and go on to test another set of models with both 1 and 2. If
>>>>>> it rejects, then we proceed to months 2 and 3.  If we move on to the
>>>>>> second set on months 1 and 2, and the critical value is accepted, I
>>>>>> want to print an accept or reject and move on to months 2 and 3,
>>>>>> until finally comparing months 12-1 at the end.
>>>>>> Is there a way to loop or automate this in R?
>>>>>> 
>>>>>> Thanks
>>>>>> Meredith
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653p4630243.html
Sent from the R help mailing list archive at Nabble.com.