[R] HELP! Excel and R give me totally different regression results using the exact same data

frauke fhoss at andrew.cmu.edu
Wed Nov 7 20:47:26 CET 2012


Okay. Sorry for being vague in my earlier message. I had missed a few lines
from your message because they were hiding well in my own email. I am really
on the learning side with this, so it will take some time. Sorry.

There seem to be two issues: (1) Me preparing the data incorrectly and (2)
the data not being fit for regression. Right?

Ad1. Point about header taken. As to using characters in a matrix, I extract
the data from data files from the National Weather Service. I extract
observations together with dates and location names. Each row comes consists
of date, location and observations.  I chose to store them in matrices
because I can combine them to arrays. A matrix can only have one type of
data, so I chose to leave them all as characters. When I proceed to do a
regression analysis I transform the observations  into numbers using
as.numeric(). Do you have a different suggestion? Will R give me different
results if I store characters in a matrix?
Even though such excerpts from a long script aren't very informative, to be
complete:
collection <- matrix(rep(NA,25),ncol=25)        #collection will be a row of
the output matrix later on. 
#extract dates

collection[1]<-paste(year,"/",substring(.file,125,126),"/",substring(.file,127,128),sep="")
#extract observations
            collection[start.write+i]<-(substring(input , fields[[i]][1] ,
fields[[i]][2]))

Ad2.  You mention heteroscedasticity and non-normality of residuals. To keep
it short I had provided just a subset of the data I have (100 of 4000 matrix
rows). But the same is true for the whole dataset. I attached the whole
thing this time.  test_complete.txt
<http://r.789695.n4.nabble.com/file/n4648759/test_complete.txt>  How do I
deal with this? I admit I am pretty clueless in this case. Can I do
meaningful regression at all? (I didn't expect test[,3] to be good predictor
but had hopes for test[,2]. 

The residuals are definitely not normally distributed. They do not seem to
related to either of the two predictors. What is the conclusion from that? 

Thanks for your patience!
Frauke






--
View this message in context: http://r.789695.n4.nabble.com/HELP-Excel-and-R-give-me-totally-different-regression-results-using-the-exact-same-data-tp4648648p4648759.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list