The Practice of Social Research
Chapter Sixteen. Social Statistics
Descriptive Statistics
Data Reduction
Measures of Association
Regression Analysis
Descriptive statistics have the purpose of, well, describing a set of
data. For the most part, they represent an act of data reduction--summarizing
a lot of data in a single number or limited set of numbers. We've already
ventured into this terrain in Chapter 14, when we examined measures of central
tendency (averages). Starting with, say, all the ages of the 2,000 people
interviewed in a survey, we can represent those ages with their mean or median.
In the present section, we'll look at some ways of summarizing the strength
of relationships among variables: measures of association. In
this discussion, you'll see the importance of the Chapter 4 discussion of
levels of measurement. To sumarize the relationship between two nominal
variables, for example, we could use lambda. You'll get
your first taste of the logic of proportionate reduction of error as
a way of determining the strength of associations. Here's an example
of what that means.
There is at least a stereotype that men like football more than women do.
Let's say we do a survey to find out. Assume further that fifty
percent of the sample says they like football. Finally assume that we
want to see how well we can guess whether individuals in the sample like football.
If we always guess they do, we'll be wrong 50 percent of the time;
the same would be true if we always guessed they didn't like football. In
fact, if we flipped a coin each time to determine our guess, we'd be wrong
about half the time.
Suppose 70 percent of the men said they liked football, contrasted with
only 20 percent of the women. Knowing a person's sex would help us
predict whether they liked football. Our best strategy would be to
always guess that the men liked football and always guess that the women
didn't. Even though we'd still make mistakes, the reduction in the
number of mistakes offers a measure of how strongly sex and liking football
are related.
Gamma is a statistic based on the same logic, which is appropriate
to representing the relationship between ordinal variables.
Correlation and regression are statistical techniques appropriate
to ratio measures.