[R] Request for functions to calculate correlated factors influencing an outcome.
Lalitha Viswanathan
lalitha.viswanathan79 at gmail.com
Sun May 3 19:46:49 CEST 2015
Hi
I am sorry, I saved the file removing the dot after the Disp (as I was
going wrong on a read.delim which threw an error about !header, etc...The
dot was not the culprit, but I continued to leave it out.
Let me paste the full code here.
x<-read.table("/Users/Documents/StatsTest/fuelEfficiency.txt", header=TRUE,
sep="\t")
x<-data.frame(x)
for (i in unique(x$Country)) { print (i); y <- subset(x, x$Country == i);
print(y); }
newx <- subset (x, select = c(Price, Reliability, Mileage, Weight, Disp,
HP))
cor(newx, method="pearson")
my.cor <-cor.test(newx$Weight, newx$Price, method="spearman")
my.cor <-cor.test(newx$Weight, newx$HP, method="spearman")
my.cor <-cor.test(newx$Disp, newx$HP, method="spearman")
Putting exact=NULL still doesn't remove the warning
my.cor <-cor.test(newx$Disp, newx$HP, method="kendall", exact=NULL)
I tried to find the correlation coeff for a various combination of
variables, but am unable to interpet the results. (Results pasted below in
an earlier post)
Followed it up with a normality test
shapiro.test(newx$Disp)
shapiro.test(newx$HP)
Then decided to do a kruskal.test(newx)
with the result
Kruskal-Wallis chi-squared = 328.94, df = 5, p-value < 2.2e-16
Question is : I am trying to find factors influencing efficiency (in this
case mileage)
What are the range of functions / examples I should be looking at, to find
a factor or combination of factors influencing efficiency?
Any pointers will be helpful
Thanks
Lalitha
On Sun, May 3, 2015 at 2:49 PM, Lalitha Viswanathan <
lalitha.viswanathan79 at gmail.com> wrote:
> Hi
> I have a dataset of the type attached.
> Here's my code thus far.
> dataset <-data.frame(read.delim("data", sep="\t", header=TRUE));
> newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight,
> Disp, HP));
> cor(newData, method="pearson");
> Results are
> Price Reliability Mileage Weight Disp
> HP
> Price 1.0000000 NA -0.6537541 0.7017999 0.4856769
> 0.6536433
> Reliability NA 1 NA NA NA
> NA
> Mileage -0.6537541 NA 1.0000000 -0.8478541 -0.6931928
> -0.6667146
> Weight 0.7017999 NA -0.8478541 1.0000000 0.8032804
> 0.7629322
> Disp 0.4856769 NA -0.6931928 0.8032804 1.0000000
> 0.8181881
> HP 0.6536433 NA -0.6667146 0.7629322 0.8181881
> 1.0000000
>
> It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP and
> Price are strongly correlated.
> To find the statistical significance,
> I am trying sample.correln<-cor.test(newData$Disp, newData$HP,
> method="kendall", exact=NULL)
> Kendall's rank correlation tau
>
> data: newx$Disp and newx$HP
> z = 7.2192, p-value = 5.229e-13
> alternative hypothesis: true tau is not equal to 0
> sample estimates:
> tau
> 0.6563871
>
> If I try the same with
> sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson",
> exact=NULL)
> I get Warning message:
> In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact = NULL)
> :
> Cannot compute exact p-value with ties
> > sample.correln
>
> Spearman's rank correlation rho
>
> data: newx$Disp and newx$HP
> S = 5716.8, p-value < 2.2e-16
> alternative hypothesis: true rho is not equal to 0
> sample estimates:
> rho
> 0.8411566
>
> I am not sure how to interpret these values.
> Basically, I am trying to figure out which combination of factors
> influences efficiency.
>
> Thanks
> Lalitha
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list