Re: sample size for kappa?



On 1 Sep 2006 14:24:28 -0700, "epichick" <rspiwak@xxxxxxxxx> wrote:

Hi there,

Can you tell me roughly how large of a sample I will need in order to
calculate Kappa with 2 raters? My outcome variable is yes, no, partial
and not applicable.

Kappa is not a very good 'absolute' statistic beyond the 2x2 case.
That is, the values of kappa depend on the marginal distributions
for the larger models -- You can't say much about what is good
or bad, except in comparison to tables with similar margins.

For multiple diagnoses, it is meaningful to say X-vs.-notX, for
each of several choices of X.
For three ordered levels, I would figure on an ordinary correlation.

"Not applicable" is a nasty choice.

Now, sample size: For reliability, you are usually assuming
that there *is* pretty good correlation. You are not concerned
with "statistical significance" but with accuracy. Different areas
have different standards that *can* be met. IQs are measured
with much more accuracy than personality items or opinions.

You are going to have to establish what range of outcome you
expect, and what range is acceptable.

I suspect you have to dummy-up some answers to see if they
*seem* to make a persuasive case for anything, if you don't
want to carry out the study to see how it works.



I am doing a study assessing an audit tool. I have 2 randomly selected
raters (out of 4 raters) and a audit tool/questionnaire that has 250
questions. I have been criticized for having a sample size that is too
small (I am looking at 4 different departments).

What *is* the sample size? Reliability is measured across
the sample, and a measured score reflects the sample variation
as well as the test items and the raters. If the sample is "4"
departments, that is too small to say much about, without
knowing beforehand whether they are very similar or very
different. Note that if they are "similar", you can get a lot
of agreement on answers while the kappa is not computable
(all the same) or not above chance (no *useful* variation,
where both raters spot the same digression from the usual.)






Can anyone tell me if my sample size indeed is too small, or if because

I have 250 questions that it is fine? We have selected 4 departments
because we are interested if there is a difference in kappa between the

departments, but this isn't our primary objective. We really want to
know if there is inter rater agreement between the 2 raters on the 250
questions, and which questions had poor agreement.


This really does sound like you are regarding the 250 as
your N. Is the same question asked 250 times, with the same
(generally) expected outcome? You would not be expected
to compute a kappa, or any other Interclass correlation, across
250 questions that are different. (Are you following a model
from some other study?)

For questions that are related to the same outcome, you
could ask what the Cronbach alpha is, which shows the item
agreement in measuring a single 'dimension' that is shared
by the items. Then you could look at the item-total correlations,
if there are a few dozen sets of the 250.


--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.



Relevant Pages

  • Re: comparing Kappa Statistics in case of dependence
    ... we would like to compare different kappa statistics. ... Here is the simple situation for 3 raters, ... Z disagrees with Y ...
    (sci.stat.math)
  • Re: Quadratic weighted Kappa and the Intraclass Correlation Coefficient
    ... kappa, using quadratic weights, asymptotically ... The Case 2 ICC assumes that the two raters compared are a random ... weighted kappa assumes that the two raters considered are the only ... random sample. ...
    (sci.stat.edu)
  • Re: kappa & ICC questions
    ... Can the data from more than one pairs of raters in an inter-rater ... reliability study ... There exists a creature called a multi-rater kappa. ... rated by each rater in the test-retest reliability study ranges from 9 ...
    (sci.stat.edu)
  • sample size for kappa?
    ... I am doing a study assessing an audit tool. ... raters and a audit tool/questionnaire that has 250 ... because we are interested if there is a difference in kappa between the ... know if there is inter rater agreement between the 2 raters on the 250 ...
    (sci.stat.consult)
  • Re: Can I use Ordinal Regression for Rank and Nominal data ?
    ... an average Spearman r among the n/2 pairs of raters, ... Evidently there is little agreement ... among your raters. ... People to evaluate the 14 panels for the rest of the tests. ...
    (sci.stat.math)