Re: Is this sample representative of the population?



In article <439dacaf$0$13322$626a14ce@xxxxxxxxxxxx>,
"Z" <zingerNOSPAM@xxxxxxx> wrote:

> Hello,
>
> I've received the following problem (no, it's not homework), and I'm trying
> to figure out what would be the best way to answer it.
>
> A client (in marketing, you can probably guess) sent us this table,
> containing counts:
>
> Profession Population Sample
>
> Executive 10041 1164
> Commercial 1734 162
> Employee/labourer 3591 309
> Farmer 414 29
> CEO or upper executive 5330 797
> Self-employed 3621 410
> Retired 5699 927
> Student 5051 294
> blank 21 21
> Other 8536 403
> Total 44038 4495
>
> "Population" implies the true population from which the sample ("Sample")
> was taken.
>
> The question asked by the client is : "Is this sample representative of the
> population?"

I see two issues, one a meta-issue and one a statistical issue. Whether
the results of a survey are representative may lie not in the data but
in the sampling method. The classic example is the telephone survey in
the 1930's that predicted the wrong winner because there was an economic
barrier to telphone use and more Republicans had phones.

I wonder if the client is sophisticated enough to be posing this as a
test. I would think analysts should know that the methodologic concerns
are paramount. How was the sampling done? Is it really true that over
30% of this undefined "population" are executives, and that over 10% are
"CEO or upper executives"? You could be asking whether there is any
additional data regarding non-completion. You would also see if there
are any identifying features that could be compared with external data
that characterize these population, i.e., how many student responders
were male in the sample vs national figures.

The second concern is whether the 10 categories could have that degree
of variation in response rates (max around 15%, min under 5%) on the
basis of chance alone considering the large number of respondents. A
general test of association (chi-square w/ 9 degrees of freedom) should
address that. My guess is that it would indicate a strong departure from
random, but that is what analysis packages are for. The lowest response
rates are in "students" and "other", suggesting some difference in
willingness to respond or lack of availability.

--
David Winsemius
.