[R] Chi square test on data frame
Petr PIKAL
petr.pikal at precheza.cz
Thu Aug 18 10:16:22 CEST 2011
Hi
r-help-bounces at r-project.org napsal dne 17.08.2011 21:07:43:
>
> Dear Michael,
>
> Thanks a lot for your reply and for your help.I was struggling so much
but
> your suggestion showed me a path to the solution of my problem.I have
> tried your code on my data frame step wise and it looks fine to me.But
> when i tried chi square test-
>
> res=chisq.test(y1[id],p=y2[id],rescale.p=T)
>
> Chi-squared test for given probabilities
>
> data: y1[id]
> X-squared = NaN, df = 19997, p-value = NA
>
> Warning message:
> In chisq.test(y1[id], p = y2[id], rescale.p = T) :
> Chi-squared approximation may be incorrect
Check what Y1[id] is.
Split Yn to lists
l1<-split(Y1[id], rep(1:6, each=2))
l2<-split(Y2[id], rep(1:6, each=2))
do mapply on those list. But the result is rather silly as Michael pointed
out.
mapply(chisq.test, l1, l2, SIMPLIFY=F)
or to get only p values
lapply(mapply(chisq.test, l1, l2, SIMPLIFY=F),"[", 3)
Regards
Petr
>
> It is not giving p value.Then i checked observed and expected values,it
is
> taking all numbers under consideration.but as i mentioned earlier i want
p
> value for each row and therefore degree of freedom will be 1. example-
>
> I have a data frame with 8 columns-
> V1 V2 V3 V4 W1 W2 W3 W4
> 1 0 84 22 10 0 84 0 0
> 2 35 84 0 0 22 84 0 0
> 3 0 0 0 48 0 0 0 48
> 4 0 48 0 0 0 48 0 0
> 5 0 84 0 0 0 84 0 0
> 6 0 0 0 48 0 0 0 48
>
> example for first row is-
>
> first two largest values are 84(in V2) and 22 (in V3).so these are
> considered as observed values.Now if the largest values are in V2 and
> V3,we have to pick expected values from W2 and W3 which are 84 and 0.I
> know for chi square test values should not be 0 but we will ignore the
warning.
>
> now it should generate p value for next row taking 35 and 84 (v1 and v2)
> as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi
> square test for all 6 rows and will generate 6 p values.My data frame
has
> lot of rows(approx. 9999).
>
> Can you please help me with this.
>
>
>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ________________________________________
> From: R. Michael Weylandt [michael.weylandt at gmail.com]
> Sent: Wednesday, August 17, 2011 7:11 PM
> To: Bansal, Vikas
> Cc: r-help at r-project.org
> Subject: Re: [R] Chi square test on data frame
>
> I think everything below is right, but it's all a little helter-skelter
so
> take it with a grain of salt:
>
> First things first, make your data with dput() for the list.
>
> Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
> 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
> 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
> ), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("V1",
> "V2", "V3", "V4", "W1", "W2", "W3", "W4")))
>
> Now,
>
> Y1 = Y[,1:4]
> Y2 = Y[,-(1:4)]
>
> id = apply(Y1,1,order,decreasing=T)[1:2,]
> # This has the columns you want in each row, but it's not directly
> appropriate for subsetting
> # Specifically, the problem is that the row information is implicit in
> where the col index is in id
> # We directly extract and force into a 2-col vector that gives rows and
> columns for each data point
> id = cbind(as.vector(col(id)),as.vector(id))
>
> Now you can take
>
> Y1[id] as the observed values and Y2[id] as the expected.
>
> But, to be honest, it sounds like you have more problems in using a
chi-sq
> test than anything else. Beyond all the zeros, you should note that you
> always have #obs >= #expected because Y1>= Y2. I'll leave that up to you
though.
>
> Hope this helps and please make sure you can take my code apart piece by
> piece to understand it: there's some odd data manipulation that takes
> advantage of R's way of coercing matrices to vectors and if your actual
> data isn't like the provided example, you may have to modify.
>
> Michael Weylandt
>
> On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas <vikas.bansal at kcl.ac.uk<
> mailto:vikas.bansal at kcl.ac.uk>> wrote:
> Is there anyone who can help me with chi square test on data frame.I am
> struggling from last 2 days.I will be very thankful to you.
>
> Dear all,
>
> I have been working on this problem from so many hours but did not find
> any solution.
> I have a data frame with 8 columns-
> V1 V2 V3 V4 W1 W2 W3 W4
> 1 0 84 22 10 0 84 0 0
> 2 35 84 0 0 22 84 0 0
> 3 0 0 0 48 0 0 0 48
> 4 0 48 0 0 0 48 0 0
> 5 0 84 0 0 0 84 0 0
> 6 0 0 0 48 0 0 0 48
>
> from first four columns, for each row I have to take two largest values.
> and these two values will be considered as observed values.And from last
> four column we will get the expected values.So i have to perform chi
> square test for each row to get p values.
>
> example for first row is-
>
> first two largest values are 84(in V2) and 22 (in V3).so these are
> considered as observed values.Now if the largest values are in V2 and
> V3,we have to pick expected values from W2 and W3 which are 84 and 0.I
> know for chi square test values should not be 0 but we will ignore the
warning.
> Now as we have observed value as well as expected we have to perform chi
> square test to get p values for each row in a new column.
>
>
> So far I was working as returning the index for two largest value with-
> sort.int<http://sort.int>(df,index.return=TRUE)$ix[c(4,3)]
> but it does not accept data frame.
>
> Can you please give some idea how to do this,because it is very tricky
and
> after studying a lot, I am not able to perform.Please help.
>
>
>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list