@COL1:INTRODUCTION
THE DANGER OF SUCCESS IN MATH
DESCRIPTIVE STATISTICS
@COL2:Data Reduction
Measures of Association
Regression Analysis
@COL1:OTHER MULTIVARIATE TECHNIQUES
@COL2:Path Analysis
Time-Series Analysis
Factor Analysis
@COL1:INFERENTIAL STATISTICS
@COL2:Univariate Inferences
Tests of Statistical Significance
The Logic of Statistical Significance
Chi Square
@COL1:MAIN POINTS
KEY TERMS
REVIEW QUESTIONS AND EXERCISES
ADDITIONAL READINGS
@H1:INTRODUCTION
@GT-no-indent:It has been my experience over
the years that many students are intimidated by statistics. Sometimes statistics
makes them feel they’re
@BL:o A few clowns short of a circus
o Dumber than a box of hair
o A few feathers short of a duck
o All foam, no beer
o Missing a few buttons on their remote control
o A few beans short of a burrito
o As screwed up as a football bat
o About as sharp as a bowling ball
o About four cents short of a nickel
o Not running on full thrusters*
@FN:*Thanks to the many contributors to humor
lists on the Internet.
@GT:Many people are intimidated by quantitative
research because they feel uncomfortable with mathematics and statistics.
And indeed, many research reports are filled with unspecified computations.
The role of statistics in social research is often important, but it is
equally important to see this role in its proper perspective.
Empirical research is first and foremost a logical
rather than a mathematical operation. Mathematics is merely a convenient
and efficient language for accomplishing the logical operations inherent
in quantitative data analysis. Statistics is the applied branch of mathematics
especially appropriate to a variety of research analyses.
This discussion begins with an informal look
at one of the concerns many people have when they approach statistics.
I hope this exercise will make it easier to understand and feel comfortable
with the relatively simple statistics introduced in the remainder of the
discussion. We’ll be looking at two types of statistics: descriptive and
inferential. Descriptive statistics is a medium for describing data in
manageable forms. Inferential statistics, on the other hand, assists researchers
in drawing conclusions from their observations; typically, this involves
drawing conclusions about a population from the study of a sample drawn
from it.
@H1:THE DANGER OF SUCCESS IN MATH
@GT-no indent:Since I began teaching research
methods that include at least a small amount of statistics, I’ve been struck
by the large number of students who report that they are "simply no good
at math." Just as some people are reported to be inherently tone-deaf and
others unable to learn foreign languages, about 90 percent of my college
students have seemed to suffer from congenital math deficiency syndrome
(CMDS). Its most common symptoms are frustration, boredom, and drowsiness.
I’m delighted to report that I have finally uncovered a major cause of
the disease and have brewed up a cure. In the event that you may be a sufferer,
I’d like to share it with you before we delve into the statistics of social
research.
@GT:You may be familiar with the story of Typhoid
Mary, whose real name was Mary Mallon. Mary was a typhoid carrier who died
in 1938 in New York. Before her death, she worked as a household cook,
moving from household to household and causing ten outbreaks of typhoid
fever. Over 50 people caught the disease from her, and 3 of them died.
The congenital math deficiency syndrome has a
similar cause. After an exhaustive search, I’ve discovered the culprit,
whom I’ll call Mathematical Marvin, though he has used countless aliases.
If you suffer from CMDS, I suspect you’ve met him. Take a minute to recall
your years in high school. Remember the person your teachers and your classmates
regarded as a "mathematical genius." Getting A’s in all the math classes
was only part of it; often the math genius seemed to know math better than
the teachers did.
Now that you have that math genius in mind, let
me ask you a few questions. First, what was the person’s gender? I’d guess
he was probably male. Most of the students I’ve asked in class report that.
But let’s consider some other characteristics:
@NL1:1. How athletic was he?
2. Did he wear glasses?
3. How many parties did he get invited to during
high school?
4. If he was invited to parties, did anyone ever
talk to him?
5. How often did you find yourself envying the
math genius, wishing you could trade places with him?
@GT:I’ve been asking students (including some
in adult classes) these questions for several years, and the answers I’ve
gotten are amazing. Marvin is usually unathletic, often either very skinny
or overweight. He usually wears glasses, and he seems otherwise rather
delicate. During his high school years, he was invited to an average (mean)
of 1.2 parties, and nobody talked to him. His complexion was terrible.
Almost nobody ever wanted to change places with him; he was a social misfit,
to be pitied rather than envied.
As I’ve discussed Marvin with my students, it
has become increasingly clear that most of them have formed a subconscious
association between mathematical proficiency and Marvin’s unenviable characteristics.
Most have concluded that doing well in math and statistics would turn them
into social misfits, which they regard as too high a price to pay.
Everything I’ve said about Mathematical Marvin
represents a powerful stereotype that many people still seem to share,
but the fact is that it’s only a social stereotype, not a matter of biology.
Women can excel in mathematics; attractive people can calculate as accurately
as unattractive ones. Yet the stereotype exercises a powerful influence
on our behavior, as evidenced in the tragic examples of young women, for
example, pretending mathematical impotence in the belief that they will
be seen as less attractive if they’re gifted in that realm.
So if you’re one of those people who’s "just
no good at math," it’s possible you carry around a hidden fear that your
face will break out in pimples if you do well in statistics in this course.
If so, you’re going to be reading the rest of this discussion in a terrible
state: wanting to understand it at least until the next exam and, at the
same time, worrying that you may understand it too well and lose all your
friends.
There is no cause for concern. The level of statistics
contained in the rest of this discussion has been proved safe for humans.
There has not been a single documented case of pimples connected to understanding
lambda, gamma, chi square, or any of the other statistics discussed in
the pages that follow. In fact, this level of exposure has been found to
be beneficial to young social researchers.
By the way, uncovering Marvin can clear up a
lot of mysteries. It did for me. (In my high school class, he didn’t wear
glasses, but he squinted a lot.) In the first research methods book I wrote,
I presented three statistical computations and got one of them wrong. In
the first edition of this book, I got a different one wrong. Most embarrassing
of all, however, the first printing of the earlier book had a unique feature.
I thought it would be fun to write a computer program to generate my own
table of random numbers rather than reprinting one that someone else had
created. In doing that, I had the dubious honor of publishing the world’s
first table of random numbers that didn’t have any nines! It was not until
I tracked Marvin down that I discovered the source of my problems, and
statistics has been much more fun (and trouble-free) ever since. So enjoy.
@H1:DESCRIPTIVE STATISTICS
@GT-no indent:As I’ve already suggested, descriptive
statistics present quantitative descriptions in a manageable form. Sometimes
we want to describe single variables, and sometimes we want to describe
the associations that connect one variable with another. Let’s look at
some of the ways to do these things.
@H2:Data Reduction
@GT-no indent:Scientific research often involves
collecting large masses of data. Suppose we surveyed 2,000 people, asking
each of them 100 questions—not an unusually large study. We would then
have a staggering 200,000 answers! No one could possibly read all those
answers and reach any meaningful conclusion about them. Thus, much scientific
analysis involves the reduction of data from unmanageable details to manageable
summaries.
@GT:To begin our discussion, let’s look briefly
at the raw data matrix created by a quantitative research project. Table
17-1 presents a partial data matrix. Notice that each row in the matrix
represents a person (or other unit of analysis), each column represents
a variable, and each cell represents the coded attribute or value a given
person has on a given variable. The first column in Table 17-1 represents
a person’s gender. Let’s say a "1" represents male and a "2" represents
female. This means that persons 1 and 2 are male, person 3 is female, and
so forth.
**[Table 17-1 about here; pickup from 8e p. 407]**
In the case of age, person 1’s "3" might mean
30?39 years old, person 2’s "4" might mean 40-49. However age has been
coded, the code numbers shown in Table 17-1 describe each of the people
represented there.
Notice that the data have already been reduced
somewhat by the time a data matrix like this one has been created. If age
has been coded as suggested previously, the specific answer "33 years old"
has already been assigned to the category "30-39." The people responding
to our survey may have given us 60 or 70 different ages, but we have now
reduced them to 6 or 7 categories.
@H2:Measures of Association
@GT-no indent:The association between any two
variables can also be represented by a data matrix, this time produced
by the joint frequency distributions of the two variables. Table 17-2 presents
such a matrix. It provides all the information needed to determine the
nature and extent of the relationship between education and prejudice.
**[Table 17-2 about here; pickup from 8e p. 407]**
@GT:Notice, for example, that 23 people (1) have
no education and (2) scored high on prejudice; 77 people (1) had graduate
degrees and (2) scored low on prejudice.
Like the raw-data matrix in Table 17-1, this
matrix provides more information than can easily be comprehended. A careful
study of the table shows that as education increases from "None" to "Graduate
Degree," there is a general tendency for prejudice to decrease, but no
more than a general impression is possible. For a more precise summary
of the data matrix, we need one of several types of descriptive statistics.
Selecting the appropriate measure depends initially on the nature of the
two variables.
We’ll turn now to some of the options available
for summarizing the association between two variables. Each of these measures
of association is based on the same model—proportionate reduction of error
(PRE).
To see how this model works, let’s assume that
I asked you to guess respondents’ attributes on a given variable: for example,
whether they answered yes or no to a given questionnaire item. To assist
you, let’s first assume you know the overall distribution of responses
in the total sample—say, 60 percent said yes and 40 percent said no. You
would make the fewest errors in this process if you always guessed the
modal (most frequent) response: yes.
Second, let’s assume you also know the empirical
relationship between the first variable and some other variable: say, gender.
Now, each time I ask you to guess whether a respondent said yes or no,
I’ll tell you whether the respondent is a man or a woman. If the two variables
are related, you should make fewer errors the second time. It’s possible,
therefore, to compute the PRE by knowing the relationship between the two
variables: the greater the relationship, the greater the reduction of error.
This basic PRE model is modified slightly to
take account of different levels of measurement—nominal, ordinal, or interval.
The following sections will consider each level of measurement and present
one measure of association appropriate to each. Bear in mind, though, that
the three measures discussed are only an arbitrary selection from among
many appropriate measures.
@H3:Nominal Variables @GT:If the two variables
consist of nominal data (for example, gender, religious affiliation, race),
lambda (l) would be one appropriate measure. (Lambda is a letter in the
Greek alphabet corresponding to l in our alphabet. Greek letters are used
for many concepts in statistics, which perhaps helps to account for the
number of people who say of statistics, "It’s all Greek to me.") Lambda
is based on your ability to guess values on one of the variables: the PRE
achieved through knowledge of values on the other variable.
Imagine this situation. I tell you that a room
contains 100 people and I would like you to guess the gender of each person,
one at a time. If half are men and half women, you will probably be right
half the time and wrong half the time. But suppose I tell you each person’s
occupation before you guess that person’s gender.
What gender would you guess if I said the person
was a truck driver? Probably you would be wise to guess "male"; although
there are now plenty of women truck drivers, most are still men. If I said
the next person was a nurse, you’d probably be wisest to guess "female,"
following the same logic. While you would still make errors in guessing
genders, you would clearly do better than you would if you didn’t know
their occupations. The extent to which you did better (the proportionate
reduction of error) would be an indicator of the association that exists
between gender and occupation.
Here’s another simple hypothetical example that
illustrates the logic and method of lambda. Table 17-3 presents hypothetical
data relating gender to employment status. Overall, we note that 1,100
people are employed, and 900 are not employed. If you were to predict whether
people were employed, knowing only the overall distribution on that variable,
you would always predict "employed," since that would result in fewer errors
than always predicting "not employed." Nevertheless, this strategy would
result in 900 errors out of 2,000 predictions.
**[Table 17-3 about here; pickup from 4e p. 408]**
Let’s suppose that you had access to the data
in Table 17-3 and that you were told each person’s gender before making
your prediction of employment status. Your strategy would change in that
case. For every man, you would predict "employed," and for every woman,
you would predict "not employed." In this instance, you would make 300
errors—the 100 men who were not employed and the 200 employed women—or
600 fewer errors than you would make without knowing the person’s gender.
Lambda, then, represents the reduction in errors
as a proportion of the errors that would have been made on the basis of
the overall distribution. In this hypothetical example, lambda would equal
.67; that is, 600 fewer errors divided by the 900 total errors based on
employment status alone. In this fashion, lambda measures the statistical
association between gender and employment status.
If gender and employment status were statistically
independent, we would find the same distribution of employment status for
men and women. In this case, knowing each person’s gender would not affect
the number of errors made in predicting employment status, and the resulting
lambda would be zero. If, on the other hand, all men were employed and
none of the women were employed, by knowing gender you would avoid errors
in predicting employment status. You would make 900 fewer errors (out of
900), so lambda would be 1.0—representing a perfect statistical association.
Lambda is only one of several measures of association
appropriate to the analysis of two nominal variables. You could look at
any statistics textbook for a discussion of other appropriate measures.
@H3:Ordinal Variables @GT:If the variables being
related are ordinal (for example, social class, religiosity, alienation),
gamma (g) is one appropriate measure of association. Like lambda, gamma
is based on our ability to guess values on one variable by knowing values
on another. However, whereas lambda is based on guessing exact values,
gamma is based on guessing the ordinal arrangement of values. For any given
pair of cases, we guess that their ordinal ranking on one variable will
correspond (positively or negatively) to their ordinal ranking on the other.
Let’s say we have a group of elementary students.
It’s reasonable to assume that there is a relationship between their ages
and their heights. We can test this by comparing every pair of students:
Sam and Mary, Sam and Fred, Mary and Fred, and so forth. Then we ignore
all the pairs in which the students are the same age and/or the same height.
We then classify each of the remaining pairs (those who differ in both
age and height) into one of two categories: those in which the older child
is also the taller ("same" pairs) and those in which the older child is
the shorter ("opposite" pairs). So, if Sam is older and taller than Mary,
the Sam-Mary pair is counted as a "same." If Sam is older but shorter than
Mary, then that pair is an "opposite." (If they’re the same age and/or
same height, we ignore them.)
To determine whether age and height are related
to one another, we compare the number of same and opposite pairs. If the
same pairs outnumber the opposite pairs, we can conclude that there is
a positive association between the two variables—as one increases, the
other increases. If there are more opposites than sames, we can conclude
that the relationship is negative. If there are about as many sames as
opposites, we can conclude that age and height are not related to each
another, that they’re independent of each other.
Here’s a social science example to illustrate
the simple calculations involved in gamma. Let’s say you suspect that religiosity
is positively related to political conservatism, and if Person A is more
religious than Person B, you guess that A is also more conservative than
B. Gamma is the proportion of paired comparisons that fits this pattern.
Table 17-4 presents hypothetical data relating
social class to prejudice. The general nature of the relationship between
these two variables is that as social class increases, prejudice decreases.
There is a negative association between social class and prejudice.
**[Table 17-4 about here; pickup from8e p. 409]**
Gamma is computed from two quantities: (1) the
number of pairs having the same ranking on the two variables and (2) the
number of pairs having the opposite ranking on the two variables. The pairs
having the same ranking are computed as follows. The frequency of each
cell in the table is multiplied by the sum of all cells appearing below
and to the right of it—with all these products being summed. In Table 17-4,
the number of pairs with the same ranking would be 200(900 + 300 + 400
+ 100) + 500(300 + 100) + 400(400 + 100) + 900(100), or 340,000 + 200,000
+ 200,000 + 90,000 = 830,000.
The pairs having the opposite ranking on the
two variables are computed as follows: The frequency of each cell in the
table is multiplied by the sum of all cells appearing below and to the
left of it—with all these products being summed. In Table 17-4, the numbers
of pairs with opposite rankings would be 700(500 + 800 + 900 + 300) + 400(800
+ 300) + 400(500 + 800) + 900(800), or 1,750,000 + 440,000 + 520,000 +
720,000 = 3,430,000. Gamma is computed from the numbers of same-ranked
pairs and opposite-ranked pairs as follows: **[Set as in 8e p. 410]**
same ? opposite
gamma = ---------------------
same + opposite
In our example, gamma equals (830,000 ? 3,430,000)
divided by (830,000 + 3,430,000), or ?.61. The negative sign in this answer
indicates the negative association suggested by the initial inspection
of the table. Social class and prejudice, in this hypothetical example,
are negatively associated with each other. The numerical figure for gamma
indicates that 61 percent more of the pairs examined had the opposite ranking
than the same ranking.
Note that whereas values of lambda vary from
0 to 1, values of gamma vary from ?1 to +1, representing the direction
as well as the magnitude of the association. Because nominal variables
have no ordinal structure, it makes no sense to speak of the direction
of the relationship. (A negative lambda would indicate that you made more
errors in predicting values on one variable while knowing values on the
second than you made in ignorance of the second, and that’s not logically
possible.)
Table 17-5 is an example of the use of gamma
in social research. To study the extent to which widows sanctified their
deceased husbands, Helena Znaniecki Lopata (1981) administered a questionnaire
to a probability sample of 301 widows. In part, the questionnaire asked
the respondents to characterize their deceased husbands in terms of the
following semantic differentiation scale: **[set as in 8e p. 410]**
@T-1:Characteristic
@T-2:Positive Extreme Negative Extreme
@TB:Good 1 2 3 4 5 6 7 Bad
Useful 1 2 3 4 5 6 7 Useless
Honest 1 2 3 4 5 6 7 Dishonest
Superior 1 2 3 4 5 6 7 Inferior
Kind 1 2 3 4 5 6 7 Cruel
Friendly 1 2 3 4 5 6 7 Unfriendly
Warm 1 2 3 4 5 6 7 Cold
**[Table 17-5 about here; pickup from 8e p. 410]**
@GT:Respondents were asked to describe their
deceased spouses by circling a number for each pair of opposing characteristics.
Notice that the series of numbers connecting each pair of characteristics
is an ordinal measure.
Next, Lopata wanted to discover the extent to
which the several measures were related to each other. Appropriately, she
chose gamma as the measure of association. Table 17-5 shows how she presented
the results of her investigation.
The format presented in Table 17-5 is called
a correlation matrix. For each pair of measures, Lopata has calculated
the gamma. Good and Useful, for example, are related to each other by a
gamma equal to .79. The matrix is a convenient way of presenting the intercorrelations
among several variables, and you’ll find it frequently in the research
literature. In this case, we see that all the variables are quite strongly
related to each other, though some pairs are more strongly related than
others.
Gamma is only one of several measures of association
appropriate to ordinal variables. Again, any introductory statistics textbook
will give you a more comprehensive treatment of this subject.
@H3:Interval or Ratio Variables @GT:If interval
or ratio variables (for example, age, income, grade point average, and
so forth) are being associated, one appropriate measure of association
is Pearson’s product-moment correlation (r). The derivation and computation
of this measure of association are complex enough to lie outside the scope
of this book, so I’ll make only a few general comments here.
Like both gamma and lambda, r is based on guessing
the value of one variable by knowing the other. For continuous interval
or ratio variables, however, it is unlikely that you could predict the
precise value of the variable. But on the other hand, predicting only the
ordinal arrangement of values on the two variables would not take advantage
of the greater amount of information conveyed by an interval or ratio variable.
In a sense, r reflects how closely you can guess the value of one variable
through your knowledge of the value of the other.
To understand the logic of r, consider the way
you might hypothetically guess values that particular cases have on a given
variable. With nominal variables, we’ve seen that you might always guess
the modal value. But for interval or ratio data, you would minimize your
errors by always guessing the mean value of the variable. Although this
practice produces few if any perfect guesses, the extent of your errors
will be minimized. Imagine the task of guessing peoples’ incomes and how
much better you would do if you knew how many years of education they had
as well as the mean incomes for people with 0, 1, 2 (and so forth) years
of education.
In the computation of lambda, we noted the number
of errors produced by always guessing the modal value. In the case of r,
errors are measured in terms of the sum of the squared differences between
the actual value and the mean. This sum is called the total variation.
To understand this concept, we must expand the
scope of our examination. Let’s look at the logic of regression analysis
and discuss to correlation within that context.
@H2:Regression Analysis
@GT-no indent:At several points in this text,
I have referred to the general formula for describing the association between
two variables: Y = f(X). This formula is read "Y is a function of X," meaning
that values of Y can be explained in terms of variations in the values
of X. Stated more strongly, we might say that X causes Y, so the value
of X determines the value of Y. Regression analysis is a method of determining
the specific function relating Y to X. There are several forms of regression
analysis, depending on the complexity of the relationships being studied.
Let’s begin with the simplest.
@H3:Linear Regression @GT:The regression model
can be seen most clearly in the case of a linear regression analysis, where
there is a perfect linear association between two variables. Figure 17-1
is a scattergram presenting in graphic form the values of X and Y as produced
by a hypothetical study. It shows that for the four cases in our study,
the values of X and Y are identical in each instance. The case with a value
of 1 on X also has a value of 1 on Y, and so forth. The relationship between
the two variables in this instance is described by the equation Y = X;
this is called the regression equation. Because all four points lie on
a straight line, we could superimpose that line over the points; this is
the regression line.
**[Figure 17-1 about here; pickup from 8e p.
411]**
@FN:Figure 17-1 @FT:Simple Scattergram of Values
of X and Y
@GT:The linear regression model has important
descriptive uses. The regression line offers a graphic picture of the association
between X and Y, and the regression equation is an efficient form for summarizing
that association. The regression model has inferential value as well. To
the extent that the regression equation correctly describes the general
association between the two variables, it may be used to predict other
sets of values. If, for example, we know that a new case has a value of
3.5 on X, we can predict the value of 3.5 on Y as well.
In practice, of course, studies are seldom limited
to four cases, and the associations between variables are seldom as clear
as the one presented in Figure 17-1.
A somewhat more realistic example is presented
in Figure 17-2, representing a hypothetical relationship between population
and crime rate in small- to medium-sized cities. Each dot in the scattergram
is a city, and its placement reflects that city’s population and its crime
rate. As was the case in our previous example, the values of Y (crime rates)
generally correspond to those of X (populations), and as values of X increase,
so do values of Y. However, the association is not nearly as clear as it
is in Figure 17-1.
**[Figure 17-2 about here; pickup from 8e p.
412]**
@FN:Figure 17-2 @FT:A Scattergram of the Values
of Two Variables with Regression Line Added (Hypothetical)
@GT:In Figure 17-2 we can’t superimpose a straight
line that will pass through all the points in the scattergram. But we can
draw an approximate line showing the best possible linear representation
of the several points. I’ve drawn that line on the graph.
You may (or may not) recall from algebra that
any straight line on a graph can be represented by an equation of the form
Y = a + bX, where X and Y are values of the two variables. In this equation,
a equals the value of Y when X is 0, and b represents the slope of the
line. If we know the values of a and b, we can calculate an estimate of
Y for every value of X.
We can now say more formally that regression
analysis is a technique for establishing the regression equation representing
the geometric line that comes closest to the distribution of points on
a graph. The regression equation provides a mathematical description of
the relationship between the variables, and it allows us to infer values
of Y when we have values of X. Recalling Figure 17-2, we could estimate
crime rates of cities if we knew their populations.
To improve your guessing, you construct a regression
line, stated in the form of a regression equation that permits the estimation
of values on one variable from values on the other. The general format
for this equation is Y ’ = a + b(X), where a and b are computed values,
X is a given value on one variable, and Y ’ is the estimated value on the
other. **[Y ’ is Y prime, as in 4e p. 413]**The values of a and b are computed
to minimize the differences between actual values of Y and the corresponding
estimates (Y ’) based on the known value of X. The sum of squared differences
between actual and estimated values of Y is called the unexplained variation
because it represents errors that still exist even when estimates are based
on known values of X.
The explained variation is the difference between
the total variation and the unexplained variation. Dividing the explained
variation by the total variation produces a measure of the proportionate
reduction of error corresponding to the similar quantity in the computation
of lambda. In the present case, this quantity is the correlation squared:
r2. Thus, if r = .7, then r2 = .49, meaning that about half the variation
has been explained. In practice, we compute r rather than r2, because the
product-moment correlation can take either a positive or a negative sign,
depending on the direction of the relationship between the two variables.
(Computing r2 and taking a square root would always produce a positive
quantity.) You can consult any standard statistics textbook for the method
of computing r, although I anticipate that most readers using this measure
will have access to computer programs designed for this function.
Unfortunately—or perhaps fortunately—social life
is so complex that the simple linear regression model often does not sufficiently
represent the state of affairs. It’s possible, using percentage tables,
to analyze more than two variables. As the number of variables increases,
such tables become increasingly complicated and hard to read. But the regression
model offers a useful alternative in such cases.
@H3:Multiple Regression @GT:Very often, social
researchers find that a given dependent variable is affected simultaneously
by several independent variables. Multiple regression analysis provides
a means of analyzing such situations. This was the case when Beverly Yerg
(1981) set about studying teacher effectiveness in physical education.
She stated her expectations in the form of a multiple regression equation:
**[Set as in 8e p. 413; see addition of comma
and "where" after first equation]**
@EX:F = b0 + b1I + b2X1 + b3X2 + b4X3 + b5X4
+ e, where
F = Final pupil-performance score
I = Initial pupil-performance score
X1 = Composite of guiding and supporting practice
X2 = Composite of teacher mastery of content
X3 = Composite of providing specific, task-related
feedback
X4 = Composite of clear, concise task presentation
b = Regression weight
e = Residual
@EXS:(ADAPTED FROM YERG 1981:42)
@GT:Notice that in place of the single X variable
in a linear regression, there are several X’s, and there are also several
b’s instead of just one. Also, Yerg has chosen to represent a as b0 in
this equation but with the same meaning as discussed previously. Finally,
the equation ends with a residual factor (e), which represents the variance
in Y that is not accounted for by the X variables analyzed.
Beginning with this equation, Yerg calculated
the values of the several b’s to show the relative contributions of the
several independent variables in determining final student-performance
scores. She also calculated the multiple-correlation coefficient as an
indicator of the extent to which all six variables predict the final scores.
This follows the same logic as the simple bivariate correlation discussed
earlier, and it is traditionally reported as a capital R. In this case,
R = .877, meaning that 77 percent of the variance (.8772= .77) in final
scores is explained by the six variables acting in concert.
@H3:Partial Regression @GT:In exploring the elaboration
model in your textbook, we paid special attention to the relationship between
two variables when a third test variable was held constant. Thus, we might
examine the effect of education on prejudice with age held constant, testing
the independent effect of education. To do so, we would compute the tabular
relationship between education and prejudice separately for each age group.
Partial regression analysis is based on this
same logical model. The equation summarizing the relationship between variables
is computed on the basis of the test variables remaining constant. As in
the case of the elaboration model, the result may then be compared with
the uncontrolled relationship between the two variables to clarify further
the overall relationship.
@H3:Curvilinear Regression @GT:Up to now, we
have been discussing the association among variables as represented by
a straight line. The regression model is even more general than our discussion
thus far has implied.
You may already know that curvilinear functions,
as well as linear ones, can be represented by equations. For example, the
equation X2 + Y2 = 25 describes a circle with a radius of 5. Raising variables
to powers greater than 1 has the effect of producing curves rather than
straight lines. And in the real world there is no reason to assume that
the relationship among every set of variables will be linear. In some cases,
then, curvilinear regression analysis can provide a better understanding
of empirical relationships than can any linear model.
Recall, however, that a regression line serves
two functions. It describes a set of empirical observations, and it provides
a general model for making inferences about the relationship between two
variables in the general population that the observations represent. A
very complex equation might produce an erratic line that would indeed pass
through every individual point. In this sense, it would perfectly describe
the empirical observations. There would be no guarantee, however, that
such a line could adequately predict new observations or that it in any
meaningful way represented the relationship between the two variables in
general. Thus, it would have little or no inferential value.
Earlier in this book, we discussed the need for
balancing detail and utility in data reduction. Ultimately, researchers
attempt to provide the most faithful, yet also the simplest, representation
of their data. This practice also applies to regression analysis. Data
should be presented in the simplest fashion (thus, linear regressions are
most frequently used) that best describes the actual data. Curvilinear
regression analysis adds a new option to the researcher in this regard,
but it does not solve the problems altogether. Nothing does that.
@H3:Cautions in Regression Analysis @GT:The use
of regression analysis for statistical inferences is based on the same
assumptions made for correlational analysis: simple random sampling, the
absence of nonsampling errors, and continuous interval data. Because social
scientific research seldom completely satisfies these assumptions, you
should use caution in assessing the results in regression analyses.
Also, regression lines—linear or curvilinear—can
be useful for interpolation (estimating cases lying between those observed),
but they are less trustworthy when used for extrapolation (estimating cases
that lie beyond the range of observations). This limitation on extrapolations
is important in two ways. First, you are likely to come across regression
equations that seem to make illogical predictions. An equation linking
population and crimes, for example, might seem to suggest that small towns
with, say, a population of 1,000 should produce 123 crimes a year. This
failure in predictive ability does not disqualify the equation but dramatizes
that its applicability is limited to a particular range of population sizes.
Second, researchers sometimes overstep this limitation, drawing inferences
that lie outside their range of observation, and you’d be right in criticizing
them for that.
The preceding sections have introduced some of
the techniques for measuring associations among variables at different
levels of measurement. Matters become slightly more complex when the two
variables represent different levels of measurement. Though we aren’t going
to pursue this issue in this textbook, the box "Measures of Association
and Levels of Measurement," by Peter Nardi, may be a useful resource if
you ever have to address such situations.
**[Box "Measures of Association and Levels of
Measurement" about here; pickup from 8e p. 415]**
@H1:OTHER MULTIVARIATE TECHNIQUES
@GT-no indent:For the most part, this book has
focused on rather rudimentary forms of data manipulation, such as the use
of contingency tables and percentages. The elaboration model of analysis
was presented in this form, as were the statistical techniques presented
so far in this discussion.
@GT:This section of the discussion presents a
cook’s tour of three other multivariate techniques from the logical perspective
of the elaboration model. This discussion isn’t intended to teach you how
to use these techniques but rather to present sufficient information so
that you can understand them if you run across them in a research report.
The three methods of analysis that we’ll examine—path analysis, time-series
analysis, and factor analysis—are only a few of the many multivariate techniques
used by social scientists.
@H2:Path Analysis
@GT-no indent:Path analysis is a causal model
for understanding relationships between variables. Though based on regression
analysis, it can provide a more useful graphic picture of relationships
among several variables than other means can. Path analysis assumes that
the values of one variable are caused by the values of another, so it is
essential to distinguish independent and dependent variables. This requirement
is not unique to path analysis, of course, but path analysis provides a
unique way of displaying explanatory results for interpretation.
Recall for a moment one of the ways I represented
the elaboration model in your textbook. Here’s how we might diagram the
logic of interpretation: **[Set the following as in 8e p. 416; lc "v"]**
Independent variable Æ Intervening variable
Æ Dependent variable
The logic of this presentation is that an independent
variable has an impact on an intervening variable, which, in turn, has
an impact on a dependent variable. The path analyst constructs similar
patterns of relationships among variables, but the typical path diagram
contains many more variables than shown in this diagram.
Besides diagramming a network of relationships
among variables, path analysis also shows the strengths of those several
relationships. The strengths of relationships are calculated from a regression
analysis that produces numbers analogous to the partial relationships in
the elaboration model. These path coefficients, as they are called, represent
the strengths of the relationships between pairs of variables, with the
effects of all other variables in the model held constant.
The analysis in Figure 17-3, for example, focuses
on the religious causes of anti-Semitism among Christian church members.
The variables in the diagram are, from left to right, (1) orthodoxy, or
the extent to which the subjects accept conventional beliefs about God,
Jesus, biblical miracles, and so forth; (2) particularism, the belief that
one’s religion is the "only true faith"; (3) acceptance of the view that
the Jews crucified Jesus; (4) religious hostility toward contemporary Jews,
such as believing that God is punishing them or that they will suffer damnation
unless they convert to Christianity; and (5) secular anti-Semitism, such
as believing that Jews cheat in business, are disloyal to their country,
and so forth.
**[Figure 17-3 about here; pickup from 8e p.
416]**
@FN:Figure 17-3 @FT:Diagramming the Religious
Sources of Anti-Semitism
@GT:To start with, the researchers who conducted
this analysis proposed that secular anti-Semitism was produced by moving
through the five variables: Orthodoxy caused particularism, which caused
the view of the historical Jews as crucifiers, which caused religious hostility
toward contemporary Jews, which resulted, finally, in secular anti-Semitism.
The path diagram tells a different story. The
researchers found, for example, that belief in the historical role of Jews
as the crucifiers of Jesus doesn’t seem to matter in the process that generates
anti-Semitism. And, although particularism is a part of one process resulting
in secular anti-Semitism, the diagram also shows that anti-Semitism is
created more directly by orthodoxy and religious hostility. Orthodoxy produces
religious hostility even without particularism, and religious hostility
generates secular hostility in any event.
One last comment on path analysis is in order.
Although it is an excellent way of handling complex causal chains and networks
of variables, path analysis itself does not tell the causal order of the
variables. Nor was the path diagram generated by computer. The researcher
decided the structure of relationships among the variables and used computer
analysis merely to calculate the path coefficients that apply to such a
structure.
@H2:Time-Series Analysis
@GT-no indent:The various forms of regression
analysis are often used to examine time-series data, representing changes
in one or more variables over time. As I’m sure you know, U.S. crime rates
have generally increased over the years. A time-series analysis of crime
rates could express the long-term trend in a regression format and provide
a way of testing explanations for the trend—such as population growth or
economic fluctuations—and could permit forecasting of future crime rates.
@GT:In a simple illustration, Figure 17-4 graphs
the larceny rates of a hypothetical city over time. Each dot on the graph
represents the number of larcenies reported to police during the year indicated.
**[Figure 17-4 about here; pickup from 8e p.
417]**
@FN:Figure 17-4 @FT:The Larceny Rates over Time
in a Hypothetical City
@GT:Suppose we feel that larceny is partly a
function of overpopulation. You might reason that crowding would lead to
psychological stress and frustration, resulting in increased crimes of
many sorts. Recalling the discussion of regression analysis, we could create
a regression equation representing the relationship between larceny and
population density—using the actual figures for each variable, with years
as the units of analysis. Having created the best-fitting regression equation,
we could then calculate a larceny rate for each year, based on that year’s
population density rate. For the sake of simplicity, let’s assume that
the city’s population size (and hence density) has been steadily increasing.
This would lead us to predict a steadily increasing larceny rate as well.
These regression estimates are represented by the dashed regression line
in Figure 17-4.
Time-series relationships are often more complex
than this simple illustration suggests. For one thing, there can be more
than one causal variable. For example, we might find that unemployment
rates also had a powerful impact on larceny. We might develop an equation
to predict larceny on the basis of both of these causal variables. As a
result, the predictions might not fall along a simple, straight line. Whereas
population density was increasing steadily in the first model, unemployment
rates rise and fall. As a consequence, our predictions of the larceny rate
would similarly go up and down.
Pursuing the relationship between larceny and
unemployment rates, we might reason that people do not begin stealing as
soon as they become unemployed. Typically, they might first exhaust their
savings, borrow from friends, and keep hoping for work. Larceny would be
a last resort.
Time-lagged regression analysis could be used
to address this more complex case. Thus, we might create a regression equation
that predicted a given year’s larceny rate based, in part, on the previous
year’s unemployment rate or perhaps on an average of the two years’ unemployment
rates. The possibilities are endless.
If you think about it, a great many causal relationships
are likely to involve a time lag. Historically, many of the world’s poor
countries have maintained their populations by matching high death rates
with equally high birthrates. It has been observed repeatedly, moreover,
that when a society’s death rate is drastically reduced—through improved
medical care, public sanitation, and improved agriculture, for example—that
society’s birthrate drops sometime later on, but with an intervening period
of rapid population growth. Or, to take a very different example, a crackdown
on speeding on a state’s highways is likely to reduce the average speed
of cars. Again, however, the causal relationship would undoubtedly involve
a time lag—days, weeks, or months, perhaps—as motorists began to realize
the seriousness of the crackdown.
In all such cases, the regression equations generated
might take many forms. In any event, the criterion for judging success
or failure is the extent to which the researcher can account for the actual
values observed for the dependent variable.
@H2:Factor Analysis
@GT-no indent: Factor analysis is a unique approach
to multivariate analysis. Its statistical basis is complex enough and different
enough from the foregoing discussions to suggest a general discussion here.
@GT:Factor analysis is a complex algebraic method
used to discover patterns among the variations in values of several variables.
This is done essentially through the generation of artificial dimensions
(factors) that correlate highly with several of the real variables and
that are independent of one another. A computer must be used to perform
this complex operation.
Let’s suppose that a data file contains several
indicators of subjects’ prejudice. Each item should provide some indication
of prejudice, but none will give a perfect indication. All of these items,
moreover, should be highly intercorrelated empirically. In a factor analysis
of the data, the researcher would create an artificial dimension that would
be highly correlated with each of the items measuring prejudice. Each subject
would essentially receive a value on that artificial dimension, and the
value assigned would be a good predictor of the observed attributes on
each item.
Suppose now that the same study provided several
indicators of subjects’ mathematical ability. It’s likely that the factor
analysis would also generate an artificial dimension highly correlated
with each of those items.
The output of a factor analysis program consists
of columns representing the several factors (artificial dimensions) generated
from the observed relations among variables plus the correlations between
each variable and each factor—called the factor loadings.
In the preceding example, it’s likely that one
factor would more or less represent prejudice, and another would more or
less represent mathematical ability. Data items measuring prejudice would
have high loadings on (correlations with) the prejudice factor and low
loadings on the mathematical ability factor. Data items measuring mathematical
ability would have just the opposite pattern.
In practice, factor analysis does not proceed
in this fashion. Rather, the variables are input to the program, and the
program outputs a series of factors with appropriate factor loadings. The
analyst must then determine the meaning of a given factor on the basis
of those variables that load highly on it. The program’s generation of
factors, however, has no reference to the meaning of variables, only to
their empirical associations. Two criteria are taken into account: (1)
a factor must explain a relatively large portion of the variance found
in the study variables, and (2) every factor must be more or less independent
of every other factor.
Here’s an example of the use of factor analysis.
Many social researchers have studied the problem of delinquency. If you
look deeply into the problem, however, you’ll discover that there are many
different types of delinquents. In a survey of high school students in
a small Wyoming town, Morris Forslund (1980) set out to create a typology
of delinquency. His questionnaire asked students to report whether they
had committed a variety of delinquent acts. He then submitted their responses
to factor analysis. The results are shown in Table 17-6.
**[Table 17-6 about here; pickup from 8e p. 419]**
As you can see in this table, the various delinquent
acts are listed on the left. The numbers shown in the body of the table
are the factor loadings on the four factors constructed in the analysis.
You’ll notice that after examining the dimensions, or factors, Forslund
labeled them. I’ve bracketed the items on each factor that led to his choice
of labels. Forslund summarized the results as follows:
@EX:For the total sample four fairly distinct
patterns of delinquent acts are apparent. In order of variance explained,
they have been labeled: 1) Property Offenses, including both vandalism
and theft; 2) Incorrigibility; 3) Drugs/Truancy; and 4) Fighting. It is
interesting, and perhaps surprising, to find both vandalism and theft appear
together in the same factor. It would seem that those high school students
who engage in property offenses tend to be involved in both vandalism and
theft. It is also interesting to note that drugs, alcohol and truancy fall
in the same factor.
@EXS:(1980:4)
@GT:Having determined this overall pattern, Forslund
reran the factor analysis separately for boys and for girls. Essentially
the same patterns emerged in both cases.
This example shows that factor analysis is an
efficient method of discovering predominant patterns among a large number
of variables. Instead of being forced to compare countless correlations—simple,
partial, and multiple—to discover those patterns, researchers can use factor
analysis for this task. Incidentally, this is a good example of a helpful
use of computers.
Factor analysis also presents data in a form
that can be interpreted by the reader or researcher. For a given factor,
the reader can easily discover the variables loading highly on it, thus
noting clusters of variables. Or, the reader can easily discover which
factors a given variable is or is not loaded highly on.
But factor analysis also has disadvantages. First,
as noted previously, factors are generated without any regard to substantive
meaning. Often researchers will find factors producing very high loadings
for a group of substantively disparate variables. They might find, for
example, that prejudice and religiosity have high positive loadings on
a given factor, with education having an equally high negative loading.
Surely the three variables are highly correlated, but what does the factor
represent in the real world? All too often, inexperienced researchers will
be led into naming such factors as "religio-prejudicial lack of education"
or something similarly nonsensical.
Second, factor analysis is often criticized on
basic philosophical grounds. Recall that to be useful, a hypothesis must
be disprovable. If the researcher cannot specify the conditions under which
the hypothesis would be disproved, the hypothesis is in reality either
a tautology or useless. In a sense, factor analysis suffers this defect.
No matter what data are input, factor analysis produces a solution in the
form of factors. Thus, if the researcher were asking, "Are there any patterns
among these variables?" the answer always would be yes. This fact must
also be taken into account in evaluating the results of factor analysis.
The generation of factors by no means ensures meaning.
My personal view of factor analysis is the same
as that for other complex modes of analysis. It can be an extremely useful
tool for the social science researcher. Its use should be encouraged whenever
such activity may assist researchers in understanding a body of data. As
in all cases, however, such tools are only tools and never magical solutions.
Let me reiterate that the analytical techniques
we’ve touched on are only a few of the many techniques commonly used by
social scientists. As you pursue your studies, you may very well want to
study this subject in more depth later.
@H1:INFERENTIAL STATISTICS
@GT-no indent:Many, if not most, social scientific
research projects involve the examination of data collected from a sample
drawn from a larger population. A sample of people may be interviewed in
a survey; a sample of divorce records may be coded and analyzed; a sample
of newspapers may be examined through content analysis. Researchers seldom
if ever study samples just to describe the samples per se; in most instances,
their ultimate purpose is to make assertions about the larger population
from which the sample has been selected. Frequently, then, you’ll wish
to interpret your univariate and multivariate sample findings as the basis
for inferences about some population.
@GT:This section examines inferential statistics—the
statistical measures used for making inferences from findings based on
sample observations to a larger population. We’ll begin with univariate
data and move to multivariate.
@H2:Univariate Inferences
@GT-no indent:Your textbook dealt with methods
of presenting univariate data. Each summary measure was intended as a method
of describing the sample studied. Now we’ll use such measures to make broader
assertions about a population. This section addresses two univariate measures:
percentages and means.
@GT:If 50 percent of a sample of people say they
had colds during the past year, 50 percent is also our best estimate of
the proportion of colds in the total population from which the sample was
drawn. (This estimate assumes a simple random sample, of course.) It’s
rather unlikely, however, that precisely 50 percent of the population had
colds during the year. If a rigorous sampling design for random selection
has been followed, however, we’ll be able to estimate the expected range
of error when the sample finding is applied to the population.
Your textbook's discussion of sampling theory,
covered the procedures for making such estimates, so I’ll only review them
here. In the case of a percentage, the quantity
**[Set p times q, over n, all in a square root
radical, as in 8e p. 421]**
where p is a proportion, q equals (1 ? p), and
n is the sample size, is called the standard error. As noted in your textbook,
this quantity is very important in the estimation of sampling error. We
may be 68 percent confident that the population figure falls within plus
or minus one standard error of the sample figure; we may be 95 percent
confident that it falls within plus or minus two standard errors; and we
may be 99.9 percent confident that it falls within plus or minus three
standard errors.
Any statement of sampling error, then, must contain
two essential components: the confidence level (for example, 95 percent)
and the confidence interval (for example, 62.5 percent). If 50 percent
of a sample of 1,600 people say they had colds during the year, we might
say we’re 95 percent confident that the population figure is between 47.5
percent and 52.5 percent.
In this example we’ve moved beyond simply describing
the sample into the realm of making estimates (inferences) about the larger
population. In doing so, we must take care in several ways.
First, the sample must be drawn from the population
about which inferences are being made. A sample taken from a telephone
directory cannot legitimately be the basis for statistical inferences about
the population of a city, but only about the population of telephone subscribers
with listed numbers.
Second, the inferential statistics assume several
things. To begin with, they assume simple random sampling, which is virtually
never the case in sample surveys. The statistics also assume sampling with
replacement, which is almost never done—but this is probably not a serious
problem. Although systematic sampling is used more frequently than random
sampling, it, too, probably presents no serious problem if done correctly.
Stratified sampling, because it improves representativeness, clearly presents
no problem. Cluster sampling does present a problem, however, because the
estimates of sampling error may be too small. Quite clearly, street-corner
sampling does not warrant the use of inferential statistics. Finally, this
standard error sampling technique assumes a 100-percent completion rate—that
is, that everyone in the sample completed the survey. This problem increases
in seriousness as the completion rate decreases.
Third, inferential statistics are addressed to
sampling error only, not nonsampling error such as coding errors or misunderstandings
of questions by respondents. Thus, although we might state correctly that
between 47.5 and 52.5 percent of the population (95 percent confidence)
would report having colds during the previous year, we couldn’t so confidently
guess the percentage who had actually had them. Because nonsampling errors
are probably larger than sampling errors in a respectable sample design,
we need to be especially cautious in generalizing from our sample findings
to the population.
@H2:Tests of Statistical Significance
@GT-no indent:There is no scientific answer to
the question of whether a given association between two variables is significant,
strong, important, interesting, or worth reporting. Perhaps the ultimate
test of significance rests with your ability to persuade your audience
(present and future) of the association’s significance. At the same time,
there is a body of inferential statistics to assist you in this regard
called parametric tests of significance. As the name suggests, parametric
statistics are those that make certain assumptions about the parameters
describing the population from which the sample is selected. They allow
us to determine the statistical significance of associations. "Statistical
significance" does not imply "importance" or "significance" in any general
sense. It refers simply to the likelihood that relationships observed in
a sample could be attributed to sampling error alone.
@GT:Although tests of statistical significance
are widely reported in social scientific literature, the logic underlying
them is rather subtle and often misunderstood. Tests of significance are
based on the same sampling logic discussed elsewhere in this book. To understand
that logic, let’s return for a moment to the concept of sampling error
in regard to univariate data.
Recall that a sample statistic normally provides
the best single estimate of the corresponding population parameter, but
the statistic and the parameter seldom correspond precisely. Thus, we report
the probability that the parameter falls within a certain range (confidence
interval). The degree of uncertainty within that range is due to normal
sampling error. The corollary of such a statement is, of course, that it
is improbable that the parameter would fall outside the specified range
only as a result of sampling error. Thus, if we estimate that a parameter
(99.9 percent confidence) lies between 45 percent and 55 percent, we say
by implication that it is extremely improbable that the parameter is actually,
say, 90 percent if our only error of estimation is due to normal sampling.
This is the basic logic behind tests of statistical significance.
@H2:The Logic of Statistical Significance
@GT-no indent:I think I can illustrate the logic
of statistical significance best in a series of diagrams representing the
selection of samples from a population. Here are the elements in the logic:
@NL1:1. Assumptions regarding the independence
of two variables in the population study
2. Assumptions regarding the representativeness
of samples selected through conventional probability sampling procedures
3. The observed joint distribution of sample
elements in terms of the two variables
@GT:Figure 17-5 represents a hypothetical population
of 256 people; half are women, half are men. The diagram also indicates
how each person feels about women enjoying equality to men. In the diagram,
those favoring equality have open circles, those opposing it have their
circles filled in.
**[Figure 17-5 about here; pickup from 8e p.
423]**
@FN:Figure 17-5 @FT:A Hypothetical Population
of Men and Women Who Either Favor or Oppose Sexual Equality
@GT:The question we’ll be investigating is whether
there is any relationship between gender and feelings about equality for
men and women. More specifically, we’ll see if women are more likely to
favor equality than are men, since women would presumably benefit more
from it. Take a moment to look at Figure 17-5 and see what the answer to
this question is.
The illustration in the figure indicates no relationship
between gender and attitudes about equality. Exactly half of each group
favors equality and half opposes it. Recall the earlier discussion of proportionate
reduction of error. In this instance, knowing a person’s gender would not
reduce the "errors" we’d make in guessing his or her attitude toward equality.
The table at the bottom of Figure 17-5 provides a tabular view of what
you can observe in the graphic diagram.
Figure 17-6 represents the selection of a one-fourth
sample from the hypothetical population. In terms of the graphic illustration,
a "square" selection from the center of the population provides a representative
sample. Notice that our sample contains 16 of each type of person: Half
are men and half are women; half of each gender favors equality, and the
other half opposes it.
**[Figure 17-6 about here; pickup from 8e p.
424]**
@FN:Figure 17-6 @FT:A Representative Sample
@GT:The sample selected in Figure 17-6 would
allow us to draw accurate conclusions about the relationship between gender
and equality in the larger population. Following the sampling logic used
in the textbook, we’d note there was no relationship between gender and
equality in the sample; thus, we’d conclude there was similarly no relationship
in the larger population—since we’ve presumably selected a sample in accord
with the conventional rules of sampling.
Of course, real-life samples are seldom such
perfect reflections of the populations from which they are drawn. It would
not be unusual for us to have selected, say, one or two extra men who opposed
equality and a couple of extra women who favored it—even if there was no
relationship between the two variables in the population. Such minor variations
are part and parcel of probability sampling.
Figure 17-7, however, represents a sample that
falls far short of the mark in reflecting the larger population. Notice
it includes far too many supportive women and opposing men. As the table
shows, three-fourths of the women in the sample support equality, but only
one-fourth of the men do so. If we had selected this sample from a population
in which the two variables were unrelated to each other, we’d be sorely
misled by our sample.
**[Figure 17-7 about here; pickup from 8e p.
425]**
@FN:Figure 17-7 @FT:An Unrepresentative Sample
@GT:As you’ll recall, it’s unlikely that a properly
drawn probability sample would ever be as inaccurate as the one shown in
Figure 17-7. In fact, if we actually selected a sample that gave us the
results this one does, we’d look for a different explanation. Figure 17-8
illustrates the more likely situation.
**[Figure 17-8 about here; pickup from 8e p.
426]**
@FN:Figure 17-8 @FT:A Representative Sample from
a Population in Which the Variables Are Related
@GT:Notice that the sample selected in Figure
17-8 also shows a strong relationship between gender and equality. The
reason is quite different this time. We’ve selected a perfectly representative
sample, but we see that there is actually a strong relationship between
the two variables in the population at large. In this latest figure, women
are more likely to support equality than are men: That’s the case in the
population, and the sample reflects it.
In practice, of course, we never know what’s
so for the total population; that’s why we select samples. So if we selected
a sample and found the strong relationship presented in Figures 17-7 and
17-8, we’d need to decide whether that finding accurately reflected the
population or was simply a product of sampling error.
The fundamental logic of tests of statistical
significance, then, is this: Faced with any discrepancy between the assumed
independence of variables in a population and the observed distribution
of sample elements, we may explain that discrepancy in either of two ways:
(1) we may attribute it to an unrepresentative sample, or (2) we may reject
the assumption of independence. The logic and statistics associated with
probability sampling methods offer guidance about the varying probabilities
of varying degrees of unrepresentativeness (expressed as sampling error).
Most simply put, there is a high probability of a small degree of unrepresentativeness
and a low probability of a large degree of unrepresentativeness.
The statistical significance of a relationship
observed in a set of sample data, then, is always expressed in terms of
probabilities. "Significant at the .05 level (p £ .05)" simply means
that the probability that a relationship as strong as the observed one
can be attributed to sampling error alone is no more than 5 in 100. Put
somewhat differently, if two variables are independent of one another in
the population, and if 100 probability samples are selected from that population,
no more than 5 of those samples should provide a relationship as strong
as the one that has been observed.
There is, then, a corollary to confidence intervals
in tests of significance, which represents the probability of the measured
associations being due only to sampling error. This is called the level
of significance. Like confidence intervals, levels of significance are
derived from a logical model in which several samples are drawn from a
given population. In the present case, we assume that there is no association
between the variables in the population, and then we ask what proportion
of the samples drawn from that population would produce associations at
least as great as those measured in the empirical data. Three levels of
significance are frequently used in research reports: .05, .01, and .001.
These mean, respectively, that the chances of obtaining the measured association
as a result of sampling error are 5/100, 1/100, and 1/1,000.
Researchers who use tests of significance normally
follow one of two patterns. Some specify in advance the level of significance
they’ll regard as sufficient. If any measured association is statistically
significant at that level, they’ll regard it as representing a genuine
association between the two variables. In other words, they’re willing
to discount the possibility of its resulting from sampling error only.
Other researchers prefer to report the specific
level of significance for each association, disregarding the conventions
of .05, .01, and .001. Rather than reporting that a given association is
significant at the .05 level, they might report significance at the .023
level, indicating the chances of its having resulted from sampling error
as 23 out of 1,000.
@H2:Chi Square
@GT-no indent:Chi square (c2) is a frequently
used test of significance in social science. It is based on the null hypothesis:
the assumption that there is no relationship between the two variables
in the total population. Given the observed distribution of values on the
two separate variables, we compute the conjoint distribution that would
be expected if there were no relationship between the two variables. The
result of this operation is a set of expected frequencies for all the cells
in the contingency table. We then compare this expected distribution with
the distribution of cases actually found in the sample data, and we determine
the probability that the discovered discrepancy could have resulted from
sampling error alone. An example will illustrate this procedure.
Let’s assume we’re interested in the possible
relationship between church attendance and gender for the members of a
particular church. To test this relationship, we select a sample of 100
church members at random. We find that our sample is made up of 40 men
and 60 women and that 70 percent of our sample say they attended church
during the preceding week, whereas the remaining 30 percent say they did
not.
If there is no relationship between gender and
church attendance, then 70 percent of the men in the sample should have
attended church during the preceding week, and 30 percent should have stayed
away. Moreover, women should have attended in the same proportion. Table
17-7 (part I) shows that, based on this model, 28 men and 42 women would
have attended church, with 12 men and 18 women not attending.
**[Table 17-7 about here; pickup from 8e p. 428]**
Part II of Table 17-7 presents the observed attendance
for the hypothetical sample of 100 church members. Note that 20 of the
men report having attended church during the preceding week, and the remaining
20 say they did not. Among the women in the sample, 50 attended church
and 10 did not. Comparing the expected and observed frequencies (parts
I and II), we note that somewhat fewer men attended church than expected,
whereas somewhat more women attended than expected.
Chi square is computed as follows. For each cell
in the tables, the researcher (1) subtracts the expected frequency for
that cell from the observed frequency, (2) squares this quantity, and (3)
divides the squared difference by the expected frequency. This procedure
is carried out for each cell in the tables, and the several results are
added together. (Part III of Table 17-7 presents the cell-by-cell computations.)
The final sum is the value of chi square: 12.70 in the example.
This value is the overall discrepancy between
the observed conjoint distribution in the sample and the distribution we
would expect if the two variables were unrelated to each other. Of course,
the mere discovery of a discrepancy does not prove that the two variables
are related, since normal sampling error might produce discrepancies even
when there is no relationship in the total population. The magnitude of
the value of chi square, however, permits us to estimate the probability
of that having happened.
@H3:Degrees of Freedom @GT:To determine the statistical
significance of the observed relationship, we must use a standard set of
chi square values. This will require the computation of the degrees of
freedom, which refers to the possibilities for variation within a statistical
model. Suppose I challenge you to find three numbers whose mean is 11.
There is an infinite number of solutions to this problem: (11, 11, 11),
(10, 11, 12), (?11, 11, 33), etc. Now, suppose I require that one of the
numbers be 7. There would still be an infinite number of possibilities
for the other two numbers.
If I told you one number had to be 7 and another
10, there would be only one possible value for the third. If the average
of three numbers is 11, their sum must be 33. If two of the numbers total
17, the third must be 16. In this situation, we say there are two degrees
of freedom. Two of the numbers could have any values we choose, but once
they are specified, the third number is determined.
More generally, whenever we are examining the
mean of N values, we can see that the degrees of freedom is N ? 1. Thus
in the case of the mean of 23 values, we could make 22 of them anything
we liked, but the 23rd would then be determined.
A similar logic applies to bivariate tables,
such as those analyzed by chi square. Consider a table reporting the relationship
between two dichotomous variables: gender (men/women) and abortion attitude
(approve/disapprove). Notice that the table provides the marginal frequencies
of both variables.
@T-1:Abortion Attitude Men Women Total
@TB:Approve 500
Disapprove 500
Total 500 500 1,000
@GT:Despite the conveniently round numbers in
this hypothetical example, notice that there are numerous possibilities
for the cell frequencies. For example, it could be the case that all 500
men approve and all 500 women disapprove, or it could be just the reverse.
Or there could be 250 cases in each cell. Notice there are numerous other
possibilities.
Now the question is, How many cells could we
fill in pretty much as we choose before the remainder are determined by
the marginal frequencies? The answer is only one. If we know that 300 men
approved, for example, then 200 men would have had to disapprove, and the
distribution would need to be just the opposite for the women.
In this instance, then, we say the table has
one degree of freedom. Now, take a few minutes to construct a three-by-three
table. Assume you know the marginal frequencies for each variable, and
see if you can determine how many degrees of freedom it has.
For chi square, the degrees of freedom are computed
as follows: the number of rows in the table of observed frequencies, minus
1, is multiplied by the number of columns, minus 1. This may be written
as (r ? 1)(c ? 1). For a three-by-three table, then, there are four degrees
of freedom: (3 ? 1)(3 ? 1) = (2)(2) = 4.
In the example of gender and church attendance,
we have two rows and two columns (discounting the totals), so there is
one degree of freedom. Turning to a table of chi square values (see Appendix
F), we find that for one degree of freedom and random sampling from a population
in which there is no relationship between two variables, 10 percent of
the time we should expect a chi square of at least 2.7. Thus, if we selected
100 samples from such a population, we should expect about 10 of those
samples to produce chi squares equal to or greater than 2.7. Moreover,
we should expect chi square values of at least 6.6 in only 1 percent of
the samples and chi square values of 7.9 in only half a percent (.005)
of the samples. The higher the chi square value, the less probable it is
that the value could be attributed to sampling error alone.
In our example, the computed value of chi square
is 12.70. If there were no relationship between gender and church attendance
in the church member population and a large number of samples had been
selected and studied, then we would expect a chi square of this magnitude
in fewer than 1/10 of 1 percent (.001) of those samples. Thus, the probability
of obtaining a chi square of this magnitude is less than .001, if random
sampling has been used and there is no relationship in the population.
We report this finding by saying the relationship is statistically significant
at the .001 level. Because it is so improbable that the observed relationship
could have resulted from sampling error alone, we’re likely to reject the
null hypothesis and assume that there is a relationship between the two
variables in the population of church members.
Most measures of association can be tested for
statistical significance in a similar manner. Standard tables of values
permit us to determine whether a given association is statistically significant
and at what level. Any standard statistics textbook provides instructions
on the use of such tables.
@H3:Some Words of Caution @GT:Tests of significance
provide an objective yardstick that we can use to estimate the statistical
significance of associations between variables. They help us rule out associations
that may not represent genuine relationships in the population under study.
However, the researcher who uses or reads reports of significance tests
should remain wary of several dangers in their interpretation.
First, we have been discussing tests of statistical
significance; there are no objective tests of substantive significance.
Thus, we may be legitimately convinced that a given association is not
due to sampling error, but we may be in the position of asserting without
fear of contradiction that two variables are only slightly related to each
other. Recall that sampling error is an inverse function of sample size—the
larger the sample, the smaller the expected error. Thus, a correlation
of, say, .1 might very well be significant (at a given level) if discovered
in a large sample, whereas the same correlation between the same two variables
would not be significant if found in a smaller sample. This makes perfectly
good sense given the basic logic of tests of significance: In the larger
sample, there is less chance that the correlation could be simply the product
of sampling error. In both samples, however, it might represent an essentially
zero correlation.
The distinction between statistical and substantive
significance is perhaps best illustrated by those cases where there is
absolute certainty that observed differences cannot be a result of sampling
error. This would be the case when we observe an entire population. Suppose
we were able to learn the ages of every public official in the United States
and of every public official in Russia. For argument’s sake, let’s assume
further that the average age of U.S. officials was 45 years old compared
with, say, 46 for the Russian officials. Because we would have the ages
of all officials, there would be no question of sampling error. We would
know with certainty that the Russian officials were older than their U.S.
counterparts. At the same time, we would say that the difference was of
no substantive significance. We’d conclude, in fact, that they were essentially
the same age.
Second, lest you be misled by this hypothetical
example, realize that statistical significance should not be calculated
on relationships observed in data collected from whole populations. Remember,
tests of statistical significance measure the likelihood of relationships
between variables being only a product of sampling error; if there’s no
sampling, there’s no sampling error.
Third, tests of significance are based on the
same sampling assumptions we used in computing confidence intervals. To
the extent that these assumptions are not met by the actual sampling design,
the tests of significance are not strictly legitimate.
While we have examined statistical significance
here in the form of chi square, there are several other measures commonly
used by social scientists. Analysis of variance and t-tests are two examples
you may run across in your studies.
As is the case for most matters covered in this
book, I have a personal prejudice. In this instance, it is against tests
of significance. I don’t object to the statistical logic of those tests,
because the logic is sound. Rather, I’m concerned that such tests seem
to mislead more than they enlighten. My principal reservations are the
following:
@NL1:1. Tests of significance make sampling assumptions
that are virtually never satisfied by actual sampling designs.
2. They depend on the absence of nonsampling
errors, a questionable assumption in most actual empirical measurements.
3. In practice, they are too often applied to
measures of association that have been computed in violation of the assumptions
made by those measures (for example, product-moment correlations computed
from ordinal data).
4. Statistical significance is too easily misinterpreted
as "strength of association," or substantive significance.
@GT:These concerns are underscored by a recent
study (Sterling, Rosenbaum, and Weinkam 1995) examining the publication
policies of nine psychology and three medical journals. As the researchers
discovered, the journals were quite unlikely to publish articles that did
not report statistically significant correlations among variables. They
quote the following from a rejection letter:
@EX:Unfortunately, we are not able to publish
this manuscript. The manuscript is very well written and the study was
well documented. Unfortunately, the negative results translates into a
minimal contribution to the field. We encourage you to continue your work
in this area and we will be glad to consider additional manuscripts that
you may prepare in the future.
@EXS:(STERLING ET AL. 1995:109)
@GT:Let’s suppose a researcher conducts a scientifically
excellent study to determine whether X causes Y. The results indicate no
statistically significant correlation. That’s good to know. If we’re interested
in what causes cancer, war, or juvenile delinquency, it’s good to know
that a possible cause actually does not cause it. That knowledge would
free researchers to look elsewhere for causes.
As we’ve seen, however, such a study might very
well be rejected by journals. As such, other researchers would continue
testing whether X causes Y, not knowing that previous studies found no
causal relationship. This would produce many wasted studies, none of which
would see publication and draw a close to the analysis of X as a cause
of Y.
From what you’ve learned about probabilities,
however, you can understand that if enough studies are conducted, one will
eventually measure a statistically significant correlation between X and
Y. If there is absolutely no relationship between the two variables, we
would expect a correlation significant at the .05 level five times out
of a hundred, since that’s what the .05 level of significance means. If
a hundred studies were conducted, therefore, we could expect five to suggest
a causal relationship where there was actually none—and those five studies
would be published!
There are, then, serious problems inherent in
too much reliance on tests of statistical significance. At the same time
(perhaps paradoxically) I would suggest that tests of significance can
be a valuable asset to the researcher—useful tools for understanding data.
Although many of my comments suggest an extremely conservative approach
to tests of significance—that you should use them only when all assumptions
are met—my general perspective is just the reverse.
I encourage you to use any statistical technique—any
measure of association or test of significance—if it will help you understand
your data. If the computation of product-moment correlations among nominal
variables and the testing of statistical significance in the context of
uncontrolled sampling will meet this criterion, then I encourage such activities.
I say this in the spirit of what Hanan Selvin, another pioneer in developing
the elaboration model, referred to as "data-dredging techniques." Anything
goes, if it leads ultimately to the understanding of data and of the social
world under study.
The price of this radical freedom, however, is
the giving up of strict, statistical interpretations. You will not be able
to base the ultimate importance of your finding solely on a significant
correlation at the .05 level. Whatever the avenue of discovery, empirical
data must ultimately be presented in a legitimate manner, and their importance
must be argued logically.
@H1:MAIN POINTS
@BL:o Descriptive statistics are used to summarize
data under study. Some descriptive statistics summarize the distribution
of attributes on a single variable; others summarize the associations between
variables.
o Descriptive statistics summarizing the relationships
between variables are called measures of association.
o Many measures of association are based on a
proportionate reduction of error (PRE) model. This model is based on a
comparison of (1) the number of errors we would make in attempting to guess
the attributes of a given variable for each of the cases under study—if
we knew nothing but the distribution of attributes on that variable—and
(2) the number of errors we would make if we knew the joint distribution
overall and were told for each case the attribute of one variable each
time we were asked to guess the attribute of the other. These measures
include lambda (l), which is appropriate for the analysis of two nominal
variables; gamma (g), which is appropriate for the analysis of two ordinal
variables; and Pearson’s product-moment correlation (r), which is appropriate
for the analysis of two interval or ratio variables.
o Regression analysis represents the relationships
between variables in the form of equations, which can be used to predict
the values of a dependent variable on the basis of values of one or more
independent variables.
o Regression equations are computed on the basis
of a regression line: that geometric line representing, with the least
amount of discrepancy, the actual location of points in a scattergram.
o Types of regression analysis include linear
regression analysis, multiple regression analysis, partial regression analysis,
and curvilinear regression analysis.
o Other multivariate techniques include time-series
analysis, the study of processes occurring over time; path analysis, a
method of presenting graphically the networks of causal relationships among
several variables; and factor analysis, a method of discovering the general
dimensions represented by a collection of actual variables.
o Inferential statistics are used to estimate
the generalizability of findings arrived at through the analysis of a sample
to the larger population from which the sample has been selected. Some
inferential statistics estimate the single-variable characteristics of
the population; others—tests of statistical significance—estimate the relationships
between variables in the population.
o Inferences about some characteristic of a population
must indicate a confidence interval and a confidence level. Computations
of confidence levels and intervals are based on probability theory and
assume that conventional probability sampling techniques have been employed
in the study.
o Inferences about the generalizability to a
population of the associations discovered between variables in a sample
involve tests of statistical significance, which estimate the likelihood
that an association as large as the observed one could result from normal
sampling error if no such association exists between the variables in the
larger population. Tests of statistical significance are also based on
probability theory and assume that conventional probability sampling techniques
have been employed in the study.
o A frequently used test of statistical significance
in social science is chi square.
o The level of significance of an observed association
is reported in the form of the probability that the association could have
been produced merely by sampling error. To say that an association is significant
at the .05 level is to say that an association as large as the observed
one could not be expected to result from sampling error more than 5 times
out of 100.
o Social researchers tend to use a particular
set of levels of significance in connection with tests of statistical significance:
.05, .01, and .001. This is merely a convention, however.
o Statistical significance must not be confused
with substantive significance, the latter meaning that an observed association
is strong, important, meaningful, or worth writing home to your mother
about.
o Tests of statistical significance, strictly
speaking, make assumptions about data and methods that are almost never
satisfied completely by real social research. Despite this, the tests can
serve a useful function in the analysis and interpretation of data.
@H1:Key Terms
@UL:descriptive statistics
proportionate reduction of error (PRE)
regression analysis
linear regression analysis
multiple regression analysis
partial regression analysis
curvilinear regression analysis
path analysis
time-series analysis
factor analysis
inferential statistics
nonsampling error
statistical significance
tests of statistical significance
level of significance
null hypothesis
@H1:REVIEW QUESTIONS AND EXERCISES
@NL1:1. In your own words, explain the logic
of proportionate reduction of error (PRE) measures of associations.
2. In your own words, explain the purpose of
regression analyses.
3. In your own words, distinguish between measures
of association and tests of statistical significance.
4. Find a study that reports the statistical
significance of its findings and critique the clarity with which it is
reported.
5. Locate a study that uses factor analysis and
summarize the findings.
@H1:ADDITIONAL READINGS
@UL:Babbie, Earl, Fred Halley, and Jeanne Zaino.
2000. Adventures in Social Research. Newbury Park, CA: Pine Forge Press.
This book introduces the analysis of social research data through SPSS
for Windows. Several of the basic statistical techniques used by social
researchers are discussed and illustrated.
Blalock, Hubert M., Jr. 1979. Social Statistics.
New York: McGraw-Hill. Blalock’s textbook has been a standard for social
science students (and faculty) for decades. Tad Blalock’s death was a loss
to all social science.
Frankfort-Nachmias, Chava. 1997. Social Statistics
for a Diverse Society. Newbury Park, CA: Pine Forge Press. A comprehensive
textbook on social statistics that makes particularly good use of graphics
in presenting the logic of the many statistics commonly used by social
scientists.
Healey, Joseph F. 1999. Statistics: A Tool for
Social Research. Belmont, CA: Wadsworth. An effective introduction to social
statistics.
Mohr, Lawrence B. 1990. Understanding Significance
Testing. Newbury Park, CA: Sage. An excellent and comprehensive examination
of the topic: both the technical details of testing statistical significance
and the meaning of such tests.
@H1:Sociology Web Site
@GT-no indent:See the Wadsworth Sociology Resource
Center, Virtual Society, for additional links, Internet exercises by chapter,
quizzes by chapter, and Microcase-related materials:
http://www.sociology.wadsworth.com
@H1:InfoTrac College Edition
@H2:Search Word Summary
@GT-no indent:Go to the Wadsworth Sociology Resource
Center, Virtual Society, to find a list of search words for each chapter.
Using the search words, go to InfoTrac College Edition, an online library
of over 900 journals where you can do online research and find readings
related to your studies. To aid in your search and to gain useful tips,
see the Student Guide to InfoTrac College Edition on the Virtual Society
Web site:
http://www.sociology.wadsworth.com