[R] Correlation when one variable has zero variance (polychoric?)

John Fox jfox at mcmaster.ca
Thu Dec 20 12:56:32 CET 2007


Dear Jose,

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Jose
> Sent: December-19-07 11:27 AM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] Correlation when one variable has zero variance
> (polychoric?)
> 
> Dear John,
> 
> > I also ran the same analysis in 2005
> > (what has changed in the package polycor since then, I don't know)
> and the
> > results were different. I think back then I contrasted them with SAS
> > and  they were the same.
> 
> John> I don't entirely follow this. Are you referring to the table
> above with
> one
> John> row, more generally to table with zero marginals, or to tables in
> which
> John> there are interior zeroes?>
> 
> I have plenty of those tables, but I think quite a few of them have
> zero
> marginals (the case I posted might be a bit extreme). I have 400
> observations,
> so no matter how centered the distributions are, some observations will
> be out
> of the center.

As I said, there's no basis for estimating polychoric correlations and all
thresholds when there are zero marginals. If there is more than one row and
column remaining with nonzero marginals, then you could simply eliminate the
rows/columns with zero marginals, but tables with only one nonzero row or
column have no information about the correlation. I'll think about doing
this -- i.e., removing zero rows and columns -- automatically and issuing a
warning.

> 
> The results I got in 2005 cannot be reproduced now in 2007 with the
> same code;
> I guess this could be due to this bug you describe (maybe it was
> introduced
> later?). In 2007, I got many correlations has high as the one I
> described and I
> was wondering what the problem was. I don't have SAS available anymore
> so I
> cannot run the code I wrote in SAS to compare.

No program, not even SAS, can magically estimate a correlation from a table
with one row or column. If polychor() did that in 2005, the answer it
provided was erroneous.

> 
> Where can I get the new code for polychor?

I plan to upload a new version of the polycor package to CRAN as soon as I
have a chance -- probably sometime this week. But you already have the code
for polychor() and can modify it yourself: Just fix the test so that it
checks for < 2 rather than < 1 row, and return NA (and issue a warning) in
this case.

> 
> I'm in a predicament here; the data I'm analyzing are from a flight
> simulation
> and are extremely expensive to get, so running more experiments is out
> of
> question.
> 
> Any pointers as to how I could analyze this dataset? (i.e. one where
> there
> might be zero marginals?)

I'm sorry, but as I said there's no magic solution here. The data, however
expensive, don't have information relevant to estimating the correlation.

Regards,
 John

> 
> Thanks
> 
> -Jose
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list