[R] Question: Self selection bias and censoring in R

Thu Oct 13 21:23:44 CEST 2011

Hi All,

I am a relative newbie to R and have the following problem I was trying to solve. I had taken a look at the 'sample selection' package but was having trouble applying it to my use case and was wondering if anyone out there had done something similar and could share code or documentation either using this package or any other packages.

I have a website page that I am subjecting to a statistical test, so I have 2  flavors a test and a control and am measuring a task completion for both. My null hypothesis is p(users completing task on control page) = p(users completing task on test page). I can randomly split the users to each page, run this test and perform a simple Z test ( prop.test() ) to compare proportions and get my answer.

However to get to the test/control page users have to 'opt in' so I am inducing a self selection bias, they also then have the ability to 'opt out' if they want thereby introducing censoring. I can randomly split the traffic between test and control for users that are opted in but I have no control over which users opt in and if they decide to opt out mid test.

Any pointers to examples, links, papers, R code etc. on how to update my simple Z test for proportions to accommodate this would be most welcome, or alternative approaches in R.

Best regards,

Mary