Re: sample size for kappa?
- From: Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx>
- Date: Fri, 01 Sep 2006 18:18:06 -0400
On 1 Sep 2006 14:24:28 -0700, "epichick" <rspiwak@xxxxxxxxx> wrote:
Hi there,
Can you tell me roughly how large of a sample I will need in order to
calculate Kappa with 2 raters? My outcome variable is yes, no, partial
and not applicable.
Kappa is not a very good 'absolute' statistic beyond the 2x2 case.
That is, the values of kappa depend on the marginal distributions
for the larger models -- You can't say much about what is good
or bad, except in comparison to tables with similar margins.
For multiple diagnoses, it is meaningful to say X-vs.-notX, for
each of several choices of X.
For three ordered levels, I would figure on an ordinary correlation.
"Not applicable" is a nasty choice.
Now, sample size: For reliability, you are usually assuming
that there *is* pretty good correlation. You are not concerned
with "statistical significance" but with accuracy. Different areas
have different standards that *can* be met. IQs are measured
with much more accuracy than personality items or opinions.
You are going to have to establish what range of outcome you
expect, and what range is acceptable.
I suspect you have to dummy-up some answers to see if they
*seem* to make a persuasive case for anything, if you don't
want to carry out the study to see how it works.
I am doing a study assessing an audit tool. I have 2 randomly selected
raters (out of 4 raters) and a audit tool/questionnaire that has 250
questions. I have been criticized for having a sample size that is too
small (I am looking at 4 different departments).
What *is* the sample size? Reliability is measured across
the sample, and a measured score reflects the sample variation
as well as the test items and the raters. If the sample is "4"
departments, that is too small to say much about, without
knowing beforehand whether they are very similar or very
different. Note that if they are "similar", you can get a lot
of agreement on answers while the kappa is not computable
(all the same) or not above chance (no *useful* variation,
where both raters spot the same digression from the usual.)
Can anyone tell me if my sample size indeed is too small, or if because
I have 250 questions that it is fine? We have selected 4 departments
because we are interested if there is a difference in kappa between the
departments, but this isn't our primary objective. We really want to
know if there is inter rater agreement between the 2 raters on the 250
questions, and which questions had poor agreement.
This really does sound like you are regarding the 250 as
your N. Is the same question asked 250 times, with the same
(generally) expected outcome? You would not be expected
to compute a kappa, or any other Interclass correlation, across
250 questions that are different. (Are you following a model
from some other study?)
For questions that are related to the same outcome, you
could ask what the Cronbach alpha is, which shows the item
agreement in measuring a single 'dimension' that is shared
by the items. Then you could look at the item-total correlations,
if there are a few dozen sets of the 250.
--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.
- References:
- sample size for kappa?
- From: epichick
- sample size for kappa?
- Prev by Date: Re: Evaluating a new diagnostic test: interim analysis possible?
- Next by Date: Re: Statistical testing of survey data
- Previous by thread: sample size for kappa?
- Index(es):
Relevant Pages
|