Re: Weighted standard deviation
- From: "Bill H" <whowells@xxxxxxxxx>
- Date: 7 Jun 2006 08:00:39 -0700
G Robin Edwards wrote:
I am investigating quite a large (by my standards!) data matrix. Actual
dimensions are 113 columns and up to 593 rows. Many columns contain
much smaller numbers of rows, but usually have at least 150. The data
are actually time series, and column 1 is the time indicator.
The columns contain data that are all supposed to be measures of a
fundamental quantity, and have the property that they are consistent in
the way they map to the fundamental property. However, they vary
spectacularly in both location and variance, by orders of magnitude in
many cases.
I wish to generate a summary of the columns in the form of a mean, and
also variance (standard deviation rather, since it is in convenient
units).
Clearly, naive averaging across the columns is useless, so I have
attempted to homogenise the data by the simple and straightforward
method of standardising every column to mean zero, variance 1, before
averaging or other treatment.
This technique appears to work very well in that I obtain what seem to
be meaningful summaries for each row (year), and I am often able to make
some intriguing inferences.
However, the original data columns also have weights attached to them -
though how these have been chosen is not clear. The weights are either
1, 0.75, 0.5 or 0.33. I can readily calculate the weighted averages of
the (standardised) values, but at the moment I haven't worked out how to
estimate standard deviations for each row, which must also depend to
some degree on the given weights. The number of columns having data
values varies from one row to another. If ignore I the weights I can
estimate a "confidence interval" for the mean and the standard
deviation, based simply on the number of columns containing actual data,
but what should I do to include the given weights in my estimate?
Perhaps I'm being stupid and there's no problem. However, I'd like to
be reassured on that.
Robin
The weights might be sampling weights that represent how the sample was
drawn, for example a stratified random sample would assign weight of
N/n to each observation where N is the size of the strata in the
population and n is the sample size for that strata. There is
specialized software that takes into account sample weights such as
SUDAAN and some routines in SAS and Stata, not sure about SPSS.
.
- Follow-Ups:
- Re: Weighted standard deviation
- From: Bruce Weaver
- Re: Weighted standard deviation
- References:
- Weighted standard deviation
- From: G Robin Edwards
- Weighted standard deviation
- Prev by Date: Re: non-parametric equivalent of the MANOVA?
- Next by Date: Re: Levenes test in ANOVA
- Previous by thread: Weighted standard deviation
- Next by thread: Re: Weighted standard deviation
- Index(es):
Relevant Pages
|