Re: Analysis of repeated measurements across different methods
- From: Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx>
- Date: Thu, 20 Apr 2006 02:07:33 -0400
On 19 Apr 2006 16:18:25 -0700, artitj@xxxxxxxxx wrote:
Hi all,
I am doing experiments to determine whether two automated methods, A
and B, are significantly different than 3 human measurements (that is,
3 different people). The reason for 3 human measurements is because the
"correct" measurement is not known, as this is measured in living
things. (If I had a gold standard, I would want to determine which
method had a smaller error relative to the gold standard.)
More specifically, I am measuring the width of a blood vessel in video
sequences of microscope slides, so the width of the blood vessel varies
over time, typically getting wider, then narrower.
I have about 10 video sequences, each with a varying # of frames
ranging from 30 to 300. For each method (2 automated methods, 3 human
observers), I measure the width of the blood vessel in each frame. The
width of the blood vessels are continuous integer data.
My data looks something like this:
Img ID Method Width 1 Width 2 ... Width 300
1 A 5 9 ... 7
1 B 6 12 ... 9
1 C 3 10 ... 3
and so on.
I am not interested in how the width varies with time, only whether the
differences between Method A versus the 3 human observers is
significant and between method B and the 3 human observers.
Off hand, I wonder at your aims. It seems likely to me that
*mean* differences between the machine ratings and 'ideal'
should be rather trivial, and something to be fixed by tuning.
If that is the only sort of difference.
Given 5 measurements on a bunch of images, of arguable
reliability, my first reaction would be to perform a 'reliability'
analysis. Cronbach's alpha-when-deleted, or other ancillary
measures, will show which of the 5 (2+3 by method) are most
reliable in echoing the others. That essentially uses Pearson r.
If the measures do not covary, *something* is not reliable.
By the way, the reliability analysis *is* a presentation of
the data in a repeated measures ANOVA.
I think you would do this, first, separately for each of the 10
video segments. If the results are *not* generally consistent,
then you a more difficult logical problem -- the results don't
generalize, and putting them into one analysis would be a
mistake..
I think I need to consider are the fact that the distances for a given
image sequence and a given method are not independent (that is, Width 1
is not independent of width 2), and that the distances for a given
image sequence and method are not independent of any other method (that
is, width 1 for image 1, method A is not independent of width 1 for
image 1, method B). However, I'm not sure how to incorporate that into
a statistical analysis.
Are you hoping for a direct comparison of A versus B,
because one is written as a mechanical version of the other?
(Try this question, then: Is one of them better in *each*
of the 10 sequences?)
I've been reading up on ANOVA, and I think maybe a two-factor ANOVA
with repeated measurements (where factor 1 is width and factor 2 is
method) would work, but I'm not entirely sure if I'm interpreting what
I'm reading correctly (i.e. I have no idea how to use it in this
problem). If this is the right approach, is there a way to use data
from all the image sequences, even though they are not the same length
(that is, image 1 might have 30 width measurements, but image 2 might
have 150, and image 3 might have 220, etc.)? And would I do ANOVA with
method A, C,D,E and then another analysis with B, C,D,E, where method A
and B are the two automated measurement methods and methoda C,D,E are
humans? Or should I do ANOVA with all the methods, then do post-hoc
analysis to determine which methods are the cause of the significant
difference?
If anybody could suggest an appropriate statistical analysis to
perform, I would much appreciate it. My knowledge of statistics is
limited, so references would be helpful too. (I think I know just
enough to be dangerous :P).
Start with correlations, since that is what is fundamental
to *showing* reliability. SPSS has a nice Reliability procedure.
If the question of Means is really intractable, say more
about that.
--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.
- References:
- Prev by Date: Re: Analysis of repeated measurements across different methods
- Next by Date: Re: Multiple linear regression: How much deviation from normal is too much?
- Previous by thread: Re: Analysis of repeated measurements across different methods
- Next by thread: Multiple linear regression: How much deviation from normal is too much?
- Index(es):
Relevant Pages
|