Re: Analysis of repeated measurements across different methods
- From: artitj@xxxxxxxxx
- Date: 19 Apr 2006 21:05:18 -0700
Thanks for your reply Jeff; I realize that I probably asked a bunch of
newbie questions, and you answered them with great detail :). I've
supplied some additional info below.
Jeff Miller wrote:
artitj@xxxxxxxxx wrote:
I am doing experiments to determine whether two automated methods, AExactly what do you mean when you ask whether automated A is
and B, are significantly different than 3 human measurements (that is,
3 different people). The reason for 3 human measurements is because the
"correct" measurement is not known, as this is measured in living
things.
different than the three human measurements? I presume the three
humans didn't all give identical measurements in all cases, or you
wouldn't have bothered to use three in the first place. So, are you
simply asking whether A is different from the _average_ of the three
humans? If so, you might consider whether it really gains you anything
to keep the three humans' scores separate, or whether you might
just as well use the average of these three scores for each frame.
I originally thought about comparing against the average, but then I
was concerned that the deviations between the humans would be as large
as the difference between the automated method and the human average,
in which case, the automated method could be reasonably considered to
be as good as a human. (For example, if for one particular frame the
humans measured 5,7,9 and the automated method measured a 9, the
automated method could be good. Or it could be a fluke...).
Next, as you probably know, the question of statistical significance
basically asks whether the deviations of the data from some hypothesis
could be explained by random error. Well, what is randomly sampled in
your
example? Are you thinking of the humans as random representatives
from some larger sample? Or the videos? The frames? I'm not sure
which one(s) would be most appropriate from your question.
I believe that the videos would be the random representatives -- I'm
trying to capture the performance of the different methods over all
videos that could be thrown at it. The humans are random
representatives as well, but I'm not sure how important that is for my
analysis.
(If I had a gold standard, I would want to determine whichThis suggests to me that you might as well use the average across
method had a smaller error relative to the gold standard.)
the three humans (for each frame), as the closest you can get to
the gold standard. Distinguishing among the different humans's scores
tells you something about the accuracy of your "gold standard"
measurements, which is useful, but that info seems subsidiary to me.
What would be a good way to incorporate this accuracy of the gold
standard. Would I just do a correlation between each human score and
the mean? (But since the mean is derived from the humans' score, I
guess that would be invalid...) Or maybe correlation between each pair
of humans? Or is the best I can do just reporting the standard
deviation and limits of agreement?
I have about 10 video sequences, each with a varying # of frames
ranging from 30 to 300. For each method (2 automated methods, 3 human
observers), I measure the width of the blood vessel in each frame. The
width of the blood vessels are continuous integer data.
My data looks something like this:
I would think of it tabulated like this instead:
Video Frame # WidthMethodA WidthMethodB WidthMethodC ...
1 1 5 9
7 ...
1 2 6 12
9 ...
For starters, then, I think it might be useful to compare the
following two correlations (correlating across all frames):
o correlation of WidthMethodA scores with AvgHuman score
o correlation of WidthMethodB scores with AvgHuman score
Or, if there are substantial differences between videos, it might be
better to run correlations like this separately for each video,
correlating across frames.
There are substantial differences between the videos, but they are
fairly representative of the range of videos I'd see in practice. So I
should probably then just calculate the correlation across the frames
in each video, and have a seperate correlation for each method and
video?
I've been reading up on ANOVA, and I think maybe a two-factor ANOVAI think you are on the wrong track here. ANOVA is good for comparing
with repeated measurements (where factor 1 is width and factor 2 is
method) would work
means, but I suspect that isn't what truly interests you. Suppose,
for example, that Method A gives exactly the same mean as the
humans, on average across all videos and frames? Does that tell
you that method A is good? Not really, I would think. It may not
actually match the gold standard of humans very well on a frame by
frame basis, but instead its errors may just average out quite well
across frames to give you the "right" overall mean.
You bring up a very good point. I was drawn to ANOVA because of some
examples in a few papers I saw on doing other types of measurement, but
admittedly I didn't fully comprehend the reasons for the authors
choosing ANOVA.
It sounds to me like you should be reading about measures of
correlation and regression, or interrater reliability, rather than
ANOVA,
but maybe I've got the wrong idea of what you are trying to do.
Perhaps you could clear this up for me (I think this may be a case
where a little bit of knowledge is a bad thing), but I read a paper by
Bland & Altman regarding measuring agreement in method comparison
studies, and they seem to suggest that the typically used Pearson
product-moment correlation coefficient is not valid since it tends to
have high correlations as long as the data are linearly related somehow
and not necessarily equal (if for example one method was always twice
the other, it would be highly correlated but not really correct). I'm
not entirely sure if their criticism extends to other correlation
measures as well though.
I starting looking at interrater reliability, but a lot of the material
I read seemed to only deal with categorical data, so after a bunch of
searching I ended up reading about t-tests and then ANOVA.
Thanks again for your insight!
.
- Follow-Ups:
- Re: Analysis of repeated measurements across different methods
- From: Jeff Miller
- Re: Analysis of repeated measurements across different methods
- References:
- Analysis of repeated measurements across different methods
- From: artitj
- Re: Analysis of repeated measurements across different methods
- From: Jeff Miller
- Analysis of repeated measurements across different methods
- Prev by Date: Re: cox regression with 3 outcomes
- Next by Date: Re: Multiple linear regression: How much deviation from normal is too much?
- Previous by thread: Re: Analysis of repeated measurements across different methods
- Next by thread: Re: Analysis of repeated measurements across different methods
- Index(es):
Relevant Pages
|