Re: Weighted standard deviation



Bill H wrote:
G Robin Edwards wrote:
I am investigating quite a large (by my standards!) data matrix. Actual
dimensions are 113 columns and up to 593 rows. Many columns contain
much smaller numbers of rows, but usually have at least 150. The data
are actually time series, and column 1 is the time indicator.

The columns contain data that are all supposed to be measures of a
fundamental quantity, and have the property that they are consistent in
the way they map to the fundamental property. However, they vary
spectacularly in both location and variance, by orders of magnitude in
many cases.

I wish to generate a summary of the columns in the form of a mean, and
also variance (standard deviation rather, since it is in convenient
units).

Clearly, naive averaging across the columns is useless, so I have
attempted to homogenise the data by the simple and straightforward
method of standardising every column to mean zero, variance 1, before
averaging or other treatment.

This technique appears to work very well in that I obtain what seem to
be meaningful summaries for each row (year), and I am often able to make
some intriguing inferences.

However, the original data columns also have weights attached to them -
though how these have been chosen is not clear. The weights are either
1, 0.75, 0.5 or 0.33. I can readily calculate the weighted averages of
the (standardised) values, but at the moment I haven't worked out how to
estimate standard deviations for each row, which must also depend to
some degree on the given weights. The number of columns having data
values varies from one row to another. If ignore I the weights I can
estimate a "confidence interval" for the mean and the standard
deviation, based simply on the number of columns containing actual data,
but what should I do to include the given weights in my estimate?

Perhaps I'm being stupid and there's no problem. However, I'd like to
be reassured on that.

Robin

The weights might be sampling weights that represent how the sample was
drawn, for example a stratified random sample would assign weight of
N/n to each observation where N is the size of the strata in the
population and n is the sample size for that strata. There is
specialized software that takes into account sample weights such as
SUDAAN and some routines in SAS and Stata, not sure about SPSS.


As of version 13 (if not earlier), SPSS has "complex samples procedures". I'm not familiar with them, but would guess that they can deal with weights etc. Here's the help file blurb.


Introduction to SPSS Complex Samples Procedures

An inherent assumption of analytical procedures in traditional software packages is that the observations in a data file represent a simple random sample from the population of interest. This assumption is untenable for an increasing number of companies and researchers who find it both cost-effective and convenient to obtain samples in a more structured way.

The SPSS Complex Samples option allows you to select a sample according to a complex design and incorporate the design specifications into the data analysis, thus ensuring that your results are valid.


--
Bruce Weaver
bweaver@xxxxxxxxxxxx
www.angelfire.com/wv/bwhomedir
.



Relevant Pages

  • Re: Key West Trip and Dive Report
    ... Since I bring my own tanks and don't use weights, ... Almost always true for those that are happy diving with standard 80s. ... boat on the surface. ...
    (rec.scuba)
  • Re: Key West Trip and Dive Report
    ... Since I bring my own tanks and don't use weights, ... Almost always true for those that are happy diving with standard 80s. ... boat on the surface. ...
    (rec.scuba)
  • Re: Weighted standard deviation
    ... also variance (standard deviation rather, ... the original data columns also have weights attached to them - ... estimate standard deviations for each row, ...
    (sci.stat.consult)
  • Weighted standard deviation
    ... also variance (standard deviation rather, ... the original data columns also have weights attached to them - ... estimate standard deviations for each row, ...
    (sci.stat.consult)
  • Re: Weighted Standard Deviation
    ... Are you talking about using a pooled standard deviation? ... determining the weights? ... Prev by Date: ...
    (microsoft.public.excel.programming)