Re: polychoric correlations



This, the first of a two-part reply, addresses correlations. A second
will discuss factor analysis.

As Rich said, with binary items, the term "tetrachoric correlation" is
often used. Both "tetrachoric" and "polychoric" are, arguably,
obsolete terms. They refer to an estimation method, often no longer
used, not to the correlation itself. A better term might be latent
correlation or latent continuous correlation.

We exclude from discussion below items for which a row or column
frequency in the fourfold table is zero. Such items have variance = 0.
Otherwise:

If (i) a single cell of the fourfold table has a zero frequency, (ii)
if both cells on the main diagonal are zero, or (iii) if both cells on
the non-main diagonal are zero, the tetrachoric correlation can still
be calculated. For this there are three options. As the options are
the same regardless of whether there is one zero cell or two, we shall
assume there is just one:

Option 1

A. Define rho = -1 if the zero cell is on the main diagonal.
B. Define rho = +1 if the zero cell is on the non-main diagonal.

These are the correct values for rho, assuming zero frequencies are to
be taken literally.

Option 2

Apply the standard discontinuity correction for fourfold tables: add
..5 to the missing cell and its diagonal neighbor, and subtract .5 from
the other two cells. Then estimate rho with the adjusted frequencies.

Option 3

Formulate the problem as a latent trait model and estimate rho that
way. (Some modern software does this automatically.) I believe this
is usually done without discontinuity correction.

Missing Values

As the first replier mentioned, it is not uncommon to use pairwise
deletion in calculating a matrix of tetrachoric correlations. This,
however, has implications for factor analysis (see my next post).

Software

In the example you supplied, I believe the Stata result (rho = 0) is
incorrect. By my calculations, the correct values, given original
fourfold table frequencies of (40, 133, 0, 18), are:

Option 1: rho = 1.0
Option 2: rho = 0.4989
Option 3: rho = 0.9612

Here are other programs you might consider:

1. TETCORR (James Fleming). If you search for "TETCORR Fleming" in
Google, you can find a description and the author's email address.
This program uses Option 2 above.

2. TetCorr (Dirk Enzmann) - different program, same name. This
handles up to 60 variables, but that can be changed. Again, searching
for "TetCorr Enzmann) will locate it. According to the documentation,
the program uses a variant of Option 1 in which a zero frequency always
results in r = -1, which I believe is incorrect. I'll mention this to
the author; if it is incorrect it can be fixed with one line of code.

3. PRELIS. Unlike the preceding programs, this is a commercial
product. If you are at a university, you can likely find several
copies of it, as it is distributed with the widely used LISREL program.

Hope this helps. For more information, you can check my web page on
Tetrachoric and Polychoric Correlation. Feel free to email me with any
questions.

--
John Uebersax, PhD

> Bellinda wrote:
>
> I am trying to do an exploratory factor analysis on data that is
> dichotomous....
> I get the message "could not calculate numerical derivatives
> missing values encountered"...
> Any advice regarding this would be greatly appreciated and I hope it
> makes sense!

.



Relevant Pages