[R] Chi square test on data frame

Thu Aug 18 10:16:22 CEST 2011

Hi

r-help-bounces at r-project.org napsal dne 17.08.2011 21:07:43:

> 
> Dear Michael,
> 
> Thanks a lot for your reply and for your help.I was struggling so much 
but
> your suggestion showed me a path to the solution of my problem.I have 
> tried your code on my data frame step wise and it looks fine to me.But 
> when i tried chi square test-
> 
> res=chisq.test(y1[id],p=y2[id],rescale.p=T)
> 
>         Chi-squared test for given probabilities
> 
> data:  y1[id] 
> X-squared = NaN, df = 19997, p-value = NA
> 
> Warning message:
> In chisq.test(y1[id], p = y2[id], rescale.p = T) :
>   Chi-squared approximation may be incorrect

Check what Y1[id] is.

Split Yn to lists
l1<-split(Y1[id], rep(1:6, each=2))
l2<-split(Y2[id], rep(1:6, each=2))

do mapply on those list. But the result is rather silly as Michael pointed 
out.

mapply(chisq.test, l1, l2, SIMPLIFY=F)

or to get only p values

lapply(mapply(chisq.test, l1, l2, SIMPLIFY=F),"[", 3)

Regards
Petr

> 
> It is not giving p value.Then i checked observed and expected values,it 
is
> taking all numbers under consideration.but as i mentioned earlier i want 
p
> value for each row and therefore degree of freedom will be 1. example-
> 
> I have a data frame with 8 columns-
>       V1   V2       V3       V4      W1   W2        W3       W4
> 1     0    84       22       10       0      84          0          0
> 2    35    84        0        0     22      84          0          0
> 3     0     0          0      48       0       0            0         48
> 4     0    48        0        0       0      48           0          0
> 5     0    84        0        0       0      84           0          0
> 6     0     0        0       48       0       0            0         48
> 
> example for first row is-
> 
> first two largest values are 84(in V2) and 22 (in V3).so these are 
> considered as observed values.Now if the largest values are in V2 and 
> V3,we have to pick expected values from W2 and W3 which are 84 and 0.I 
> know for chi square test values should not be 0 but we will ignore the 
warning.
> 
> now it should generate p value for next row taking 35 and 84 (v1 and v2) 

> as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi 

> square test for all 6 rows and will generate 6 p values.My data frame 
has 
> lot of rows(approx. 9999).
> 
> Can you please help me with this.
> 
> 
> 
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ________________________________________
> From: R. Michael Weylandt [michael.weylandt at gmail.com]
> Sent: Wednesday, August 17, 2011 7:11 PM
> To: Bansal, Vikas
> Cc: r-help at r-project.org
> Subject: Re: [R] Chi square test on data frame
> 
> I think everything below is right, but it's all a little helter-skelter 
so
> take it with a grain of salt:
> 
> First things first, make your data with dput() for the list.
> 
> Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
> 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
> 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
> ), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("V1",
> "V2", "V3", "V4", "W1", "W2", "W3", "W4")))
> 
> Now,
> 
> Y1 = Y[,1:4]
> Y2 = Y[,-(1:4)]
> 
> id = apply(Y1,1,order,decreasing=T)[1:2,]
> # This has the columns you want in each row, but it's not directly 
> appropriate for subsetting
> # Specifically, the problem is that the row information is implicit in 
> where the col index is in id
> # We directly extract and force into a 2-col vector that gives rows and 
> columns for each data point
> id = cbind(as.vector(col(id)),as.vector(id))
> 
> Now you can take
> 
> Y1[id] as the observed values and Y2[id] as the expected.
> 
> But, to be honest, it sounds like you have more problems in using a 
chi-sq
> test than anything else. Beyond all the zeros, you should note that you 
> always have #obs >= #expected because Y1>= Y2. I'll leave that up to you 
though.
> 
> Hope this helps and please make sure you can take my code apart piece by 

> piece to understand it: there's some odd data manipulation that takes 
> advantage of R's way of coercing matrices to vectors and if your actual 
> data isn't like the provided example, you may have to modify.
> 
> Michael Weylandt
> 
> On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas <vikas.bansal at kcl.ac.uk<
> mailto:vikas.bansal at kcl.ac.uk>> wrote:
> Is there anyone who can help me with chi square test on data frame.I am 
> struggling from last 2 days.I will be very  thankful to you.
> 
> Dear all,
> 
> I have been working on this problem from so many hours but did not find 
> any solution.
> I have a data frame with 8 columns-
>       V1   V2       V3       V4      W1   W2        W3       W4
> 1     0    84       22       10       0      84          0          0
> 2    35    84        0        0     22      84          0          0
> 3     0     0          0      48       0       0            0         48
> 4     0    48        0        0       0      48           0          0
> 5     0    84        0        0       0      84           0          0
> 6     0     0        0       48       0       0            0         48
> 
> from first four columns, for each row I have to take two largest values. 

> and these two values will be considered as observed values.And from last 

> four column we will get the expected values.So i have to perform chi 
> square test for each row to get p values.
> 
> example for first row is-
> 
> first two largest values are 84(in V2) and 22 (in V3).so these are 
> considered as observed values.Now if the largest values are in V2 and 
> V3,we have to pick expected values from W2 and W3 which are 84 and 0.I 
> know for chi square test values should not be 0 but we will ignore the 
warning.
> Now as we have observed value as well as expected we have to perform chi 

> square test to get p values for each row in a new column.
> 
> 
> So far I was working as returning the index for two largest value with-
> sort.int<http://sort.int>(df,index.return=TRUE)$ix[c(4,3)]
>  but it does not accept data frame.
> 
> Can you please give some idea how to do this,because it is very tricky 
and
> after studying a lot, I am not able to perform.Please help.
> 
> 
> 
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.