THE DANGER OF SUCCESS IN MATH
Measures of Association
@COL1:OTHER MULTIVARIATE TECHNIQUES
Tests of Statistical Significance
The Logic of Statistical Significance
REVIEW QUESTIONS AND EXERCISES
@GT-no-indent:It has been my experience over the years that many students are intimidated by statistics. Sometimes statistics makes them feel they’re
@BL:o A few clowns short of a circus
o Dumber than a box of hair
o A few feathers short of a duck
o All foam, no beer
o Missing a few buttons on their remote control
o A few beans short of a burrito
o As screwed up as a football bat
o About as sharp as a bowling ball
o About four cents short of a nickel
o Not running on full thrusters*
@FN:*Thanks to the many contributors to humor lists on the Internet.
@GT:Many people are intimidated by quantitative research because they feel uncomfortable with mathematics and statistics. And indeed, many research reports are filled with unspecified computations. The role of statistics in social research is often important, but it is equally important to see this role in its proper perspective.
Empirical research is first and foremost a logical rather than a mathematical operation. Mathematics is merely a convenient and efficient language for accomplishing the logical operations inherent in quantitative data analysis. Statistics is the applied branch of mathematics especially appropriate to a variety of research analyses.
This discussion begins with an informal look at one of the concerns many people have when they approach statistics. I hope this exercise will make it easier to understand and feel comfortable with the relatively simple statistics introduced in the remainder of the discussion. We’ll be looking at two types of statistics: descriptive and inferential. Descriptive statistics is a medium for describing data in manageable forms. Inferential statistics, on the other hand, assists researchers in drawing conclusions from their observations; typically, this involves drawing conclusions about a population from the study of a sample drawn from it.
@H1:THE DANGER OF SUCCESS IN MATH
@GT-no indent:Since I began teaching research methods that include at least a small amount of statistics, I’ve been struck by the large number of students who report that they are "simply no good at math." Just as some people are reported to be inherently tone-deaf and others unable to learn foreign languages, about 90 percent of my college students have seemed to suffer from congenital math deficiency syndrome (CMDS). Its most common symptoms are frustration, boredom, and drowsiness. I’m delighted to report that I have finally uncovered a major cause of the disease and have brewed up a cure. In the event that you may be a sufferer, I’d like to share it with you before we delve into the statistics of social research.
@GT:You may be familiar with the story of Typhoid Mary, whose real name was Mary Mallon. Mary was a typhoid carrier who died in 1938 in New York. Before her death, she worked as a household cook, moving from household to household and causing ten outbreaks of typhoid fever. Over 50 people caught the disease from her, and 3 of them died.
The congenital math deficiency syndrome has a similar cause. After an exhaustive search, I’ve discovered the culprit, whom I’ll call Mathematical Marvin, though he has used countless aliases. If you suffer from CMDS, I suspect you’ve met him. Take a minute to recall your years in high school. Remember the person your teachers and your classmates regarded as a "mathematical genius." Getting A’s in all the math classes was only part of it; often the math genius seemed to know math better than the teachers did.
Now that you have that math genius in mind, let me ask you a few questions. First, what was the person’s gender? I’d guess he was probably male. Most of the students I’ve asked in class report that. But let’s consider some other characteristics:
@NL1:1. How athletic was he?
2. Did he wear glasses?
3. How many parties did he get invited to during high school?
4. If he was invited to parties, did anyone ever talk to him?
5. How often did you find yourself envying the math genius, wishing you could trade places with him?
@GT:I’ve been asking students (including some in adult classes) these questions for several years, and the answers I’ve gotten are amazing. Marvin is usually unathletic, often either very skinny or overweight. He usually wears glasses, and he seems otherwise rather delicate. During his high school years, he was invited to an average (mean) of 1.2 parties, and nobody talked to him. His complexion was terrible. Almost nobody ever wanted to change places with him; he was a social misfit, to be pitied rather than envied.
As I’ve discussed Marvin with my students, it has become increasingly clear that most of them have formed a subconscious association between mathematical proficiency and Marvin’s unenviable characteristics. Most have concluded that doing well in math and statistics would turn them into social misfits, which they regard as too high a price to pay.
Everything I’ve said about Mathematical Marvin represents a powerful stereotype that many people still seem to share, but the fact is that it’s only a social stereotype, not a matter of biology. Women can excel in mathematics; attractive people can calculate as accurately as unattractive ones. Yet the stereotype exercises a powerful influence on our behavior, as evidenced in the tragic examples of young women, for example, pretending mathematical impotence in the belief that they will be seen as less attractive if they’re gifted in that realm.
So if you’re one of those people who’s "just no good at math," it’s possible you carry around a hidden fear that your face will break out in pimples if you do well in statistics in this course. If so, you’re going to be reading the rest of this discussion in a terrible state: wanting to understand it at least until the next exam and, at the same time, worrying that you may understand it too well and lose all your friends.
There is no cause for concern. The level of statistics contained in the rest of this discussion has been proved safe for humans. There has not been a single documented case of pimples connected to understanding lambda, gamma, chi square, or any of the other statistics discussed in the pages that follow. In fact, this level of exposure has been found to be beneficial to young social researchers.
By the way, uncovering Marvin can clear up a lot of mysteries. It did for me. (In my high school class, he didn’t wear glasses, but he squinted a lot.) In the first research methods book I wrote, I presented three statistical computations and got one of them wrong. In the first edition of this book, I got a different one wrong. Most embarrassing of all, however, the first printing of the earlier book had a unique feature. I thought it would be fun to write a computer program to generate my own table of random numbers rather than reprinting one that someone else had created. In doing that, I had the dubious honor of publishing the world’s first table of random numbers that didn’t have any nines! It was not until I tracked Marvin down that I discovered the source of my problems, and statistics has been much more fun (and trouble-free) ever since. So enjoy.
@GT-no indent:As I’ve already suggested, descriptive statistics present quantitative descriptions in a manageable form. Sometimes we want to describe single variables, and sometimes we want to describe the associations that connect one variable with another. Let’s look at some of the ways to do these things.
@GT-no indent:Scientific research often involves collecting large masses of data. Suppose we surveyed 2,000 people, asking each of them 100 questions—not an unusually large study. We would then have a staggering 200,000 answers! No one could possibly read all those answers and reach any meaningful conclusion about them. Thus, much scientific analysis involves the reduction of data from unmanageable details to manageable summaries.
@GT:To begin our discussion, let’s look briefly at the raw data matrix created by a quantitative research project. Table 17-1 presents a partial data matrix. Notice that each row in the matrix represents a person (or other unit of analysis), each column represents a variable, and each cell represents the coded attribute or value a given person has on a given variable. The first column in Table 17-1 represents a person’s gender. Let’s say a "1" represents male and a "2" represents female. This means that persons 1 and 2 are male, person 3 is female, and so forth.
**[Table 17-1 about here; pickup from 8e p. 407]**
In the case of age, person 1’s "3" might mean 30?39 years old, person 2’s "4" might mean 40-49. However age has been coded, the code numbers shown in Table 17-1 describe each of the people represented there.
Notice that the data have already been reduced somewhat by the time a data matrix like this one has been created. If age has been coded as suggested previously, the specific answer "33 years old" has already been assigned to the category "30-39." The people responding to our survey may have given us 60 or 70 different ages, but we have now reduced them to 6 or 7 categories.
@H2:Measures of Association
@GT-no indent:The association between any two variables can also be represented by a data matrix, this time produced by the joint frequency distributions of the two variables. Table 17-2 presents such a matrix. It provides all the information needed to determine the nature and extent of the relationship between education and prejudice.
**[Table 17-2 about here; pickup from 8e p. 407]**
@GT:Notice, for example, that 23 people (1) have no education and (2) scored high on prejudice; 77 people (1) had graduate degrees and (2) scored low on prejudice.
Like the raw-data matrix in Table 17-1, this matrix provides more information than can easily be comprehended. A careful study of the table shows that as education increases from "None" to "Graduate Degree," there is a general tendency for prejudice to decrease, but no more than a general impression is possible. For a more precise summary of the data matrix, we need one of several types of descriptive statistics. Selecting the appropriate measure depends initially on the nature of the two variables.
We’ll turn now to some of the options available for summarizing the association between two variables. Each of these measures of association is based on the same model—proportionate reduction of error (PRE).
To see how this model works, let’s assume that I asked you to guess respondents’ attributes on a given variable: for example, whether they answered yes or no to a given questionnaire item. To assist you, let’s first assume you know the overall distribution of responses in the total sample—say, 60 percent said yes and 40 percent said no. You would make the fewest errors in this process if you always guessed the modal (most frequent) response: yes.
Second, let’s assume you also know the empirical relationship between the first variable and some other variable: say, gender. Now, each time I ask you to guess whether a respondent said yes or no, I’ll tell you whether the respondent is a man or a woman. If the two variables are related, you should make fewer errors the second time. It’s possible, therefore, to compute the PRE by knowing the relationship between the two variables: the greater the relationship, the greater the reduction of error.
This basic PRE model is modified slightly to take account of different levels of measurement—nominal, ordinal, or interval. The following sections will consider each level of measurement and present one measure of association appropriate to each. Bear in mind, though, that the three measures discussed are only an arbitrary selection from among many appropriate measures.
@H3:Nominal Variables @GT:If the two variables consist of nominal data (for example, gender, religious affiliation, race), lambda (l) would be one appropriate measure. (Lambda is a letter in the Greek alphabet corresponding to l in our alphabet. Greek letters are used for many concepts in statistics, which perhaps helps to account for the number of people who say of statistics, "It’s all Greek to me.") Lambda is based on your ability to guess values on one of the variables: the PRE achieved through knowledge of values on the other variable.
Imagine this situation. I tell you that a room contains 100 people and I would like you to guess the gender of each person, one at a time. If half are men and half women, you will probably be right half the time and wrong half the time. But suppose I tell you each person’s occupation before you guess that person’s gender.
What gender would you guess if I said the person was a truck driver? Probably you would be wise to guess "male"; although there are now plenty of women truck drivers, most are still men. If I said the next person was a nurse, you’d probably be wisest to guess "female," following the same logic. While you would still make errors in guessing genders, you would clearly do better than you would if you didn’t know their occupations. The extent to which you did better (the proportionate reduction of error) would be an indicator of the association that exists between gender and occupation.
Here’s another simple hypothetical example that illustrates the logic and method of lambda. Table 17-3 presents hypothetical data relating gender to employment status. Overall, we note that 1,100 people are employed, and 900 are not employed. If you were to predict whether people were employed, knowing only the overall distribution on that variable, you would always predict "employed," since that would result in fewer errors than always predicting "not employed." Nevertheless, this strategy would result in 900 errors out of 2,000 predictions.
**[Table 17-3 about here; pickup from 4e p. 408]**
Let’s suppose that you had access to the data in Table 17-3 and that you were told each person’s gender before making your prediction of employment status. Your strategy would change in that case. For every man, you would predict "employed," and for every woman, you would predict "not employed." In this instance, you would make 300 errors—the 100 men who were not employed and the 200 employed women—or 600 fewer errors than you would make without knowing the person’s gender.
Lambda, then, represents the reduction in errors as a proportion of the errors that would have been made on the basis of the overall distribution. In this hypothetical example, lambda would equal .67; that is, 600 fewer errors divided by the 900 total errors based on employment status alone. In this fashion, lambda measures the statistical association between gender and employment status.
If gender and employment status were statistically independent, we would find the same distribution of employment status for men and women. In this case, knowing each person’s gender would not affect the number of errors made in predicting employment status, and the resulting lambda would be zero. If, on the other hand, all men were employed and none of the women were employed, by knowing gender you would avoid errors in predicting employment status. You would make 900 fewer errors (out of 900), so lambda would be 1.0—representing a perfect statistical association.
Lambda is only one of several measures of association appropriate to the analysis of two nominal variables. You could look at any statistics textbook for a discussion of other appropriate measures.
@H3:Ordinal Variables @GT:If the variables being related are ordinal (for example, social class, religiosity, alienation), gamma (g) is one appropriate measure of association. Like lambda, gamma is based on our ability to guess values on one variable by knowing values on another. However, whereas lambda is based on guessing exact values, gamma is based on guessing the ordinal arrangement of values. For any given pair of cases, we guess that their ordinal ranking on one variable will correspond (positively or negatively) to their ordinal ranking on the other.
Let’s say we have a group of elementary students. It’s reasonable to assume that there is a relationship between their ages and their heights. We can test this by comparing every pair of students: Sam and Mary, Sam and Fred, Mary and Fred, and so forth. Then we ignore all the pairs in which the students are the same age and/or the same height. We then classify each of the remaining pairs (those who differ in both age and height) into one of two categories: those in which the older child is also the taller ("same" pairs) and those in which the older child is the shorter ("opposite" pairs). So, if Sam is older and taller than Mary, the Sam-Mary pair is counted as a "same." If Sam is older but shorter than Mary, then that pair is an "opposite." (If they’re the same age and/or same height, we ignore them.)
To determine whether age and height are related to one another, we compare the number of same and opposite pairs. If the same pairs outnumber the opposite pairs, we can conclude that there is a positive association between the two variables—as one increases, the other increases. If there are more opposites than sames, we can conclude that the relationship is negative. If there are about as many sames as opposites, we can conclude that age and height are not related to each another, that they’re independent of each other.
Here’s a social science example to illustrate the simple calculations involved in gamma. Let’s say you suspect that religiosity is positively related to political conservatism, and if Person A is more religious than Person B, you guess that A is also more conservative than B. Gamma is the proportion of paired comparisons that fits this pattern.
Table 17-4 presents hypothetical data relating social class to prejudice. The general nature of the relationship between these two variables is that as social class increases, prejudice decreases. There is a negative association between social class and prejudice.
**[Table 17-4 about here; pickup from8e p. 409]**
Gamma is computed from two quantities: (1) the number of pairs having the same ranking on the two variables and (2) the number of pairs having the opposite ranking on the two variables. The pairs having the same ranking are computed as follows. The frequency of each cell in the table is multiplied by the sum of all cells appearing below and to the right of it—with all these products being summed. In Table 17-4, the number of pairs with the same ranking would be 200(900 + 300 + 400 + 100) + 500(300 + 100) + 400(400 + 100) + 900(100), or 340,000 + 200,000 + 200,000 + 90,000 = 830,000.
The pairs having the opposite ranking on the two variables are computed as follows: The frequency of each cell in the table is multiplied by the sum of all cells appearing below and to the left of it—with all these products being summed. In Table 17-4, the numbers of pairs with opposite rankings would be 700(500 + 800 + 900 + 300) + 400(800 + 300) + 400(500 + 800) + 900(800), or 1,750,000 + 440,000 + 520,000 + 720,000 = 3,430,000. Gamma is computed from the numbers of same-ranked pairs and opposite-ranked pairs as follows: **[Set as in 8e p. 410]**
same ? opposite
gamma = ---------------------
same + opposite
In our example, gamma equals (830,000 ? 3,430,000) divided by (830,000 + 3,430,000), or ?.61. The negative sign in this answer indicates the negative association suggested by the initial inspection of the table. Social class and prejudice, in this hypothetical example, are negatively associated with each other. The numerical figure for gamma indicates that 61 percent more of the pairs examined had the opposite ranking than the same ranking.
Note that whereas values of lambda vary from 0 to 1, values of gamma vary from ?1 to +1, representing the direction as well as the magnitude of the association. Because nominal variables have no ordinal structure, it makes no sense to speak of the direction of the relationship. (A negative lambda would indicate that you made more errors in predicting values on one variable while knowing values on the second than you made in ignorance of the second, and that’s not logically possible.)
Table 17-5 is an example of the use of gamma in social research. To study the extent to which widows sanctified their deceased husbands, Helena Znaniecki Lopata (1981) administered a questionnaire to a probability sample of 301 widows. In part, the questionnaire asked the respondents to characterize their deceased husbands in terms of the following semantic differentiation scale: **[set as in 8e p. 410]**
@T-2:Positive Extreme Negative Extreme
@TB:Good 1 2 3 4 5 6 7 Bad
Useful 1 2 3 4 5 6 7 Useless
Honest 1 2 3 4 5 6 7 Dishonest
Superior 1 2 3 4 5 6 7 Inferior
Kind 1 2 3 4 5 6 7 Cruel
Friendly 1 2 3 4 5 6 7 Unfriendly
Warm 1 2 3 4 5 6 7 Cold
**[Table 17-5 about here; pickup from 8e p. 410]**
@GT:Respondents were asked to describe their deceased spouses by circling a number for each pair of opposing characteristics. Notice that the series of numbers connecting each pair of characteristics is an ordinal measure.
Next, Lopata wanted to discover the extent to which the several measures were related to each other. Appropriately, she chose gamma as the measure of association. Table 17-5 shows how she presented the results of her investigation.
The format presented in Table 17-5 is called a correlation matrix. For each pair of measures, Lopata has calculated the gamma. Good and Useful, for example, are related to each other by a gamma equal to .79. The matrix is a convenient way of presenting the intercorrelations among several variables, and you’ll find it frequently in the research literature. In this case, we see that all the variables are quite strongly related to each other, though some pairs are more strongly related than others.
Gamma is only one of several measures of association appropriate to ordinal variables. Again, any introductory statistics textbook will give you a more comprehensive treatment of this subject.
@H3:Interval or Ratio Variables @GT:If interval or ratio variables (for example, age, income, grade point average, and so forth) are being associated, one appropriate measure of association is Pearson’s product-moment correlation (r). The derivation and computation of this measure of association are complex enough to lie outside the scope of this book, so I’ll make only a few general comments here.
Like both gamma and lambda, r is based on guessing the value of one variable by knowing the other. For continuous interval or ratio variables, however, it is unlikely that you could predict the precise value of the variable. But on the other hand, predicting only the ordinal arrangement of values on the two variables would not take advantage of the greater amount of information conveyed by an interval or ratio variable. In a sense, r reflects how closely you can guess the value of one variable through your knowledge of the value of the other.
To understand the logic of r, consider the way you might hypothetically guess values that particular cases have on a given variable. With nominal variables, we’ve seen that you might always guess the modal value. But for interval or ratio data, you would minimize your errors by always guessing the mean value of the variable. Although this practice produces few if any perfect guesses, the extent of your errors will be minimized. Imagine the task of guessing peoples’ incomes and how much better you would do if you knew how many years of education they had as well as the mean incomes for people with 0, 1, 2 (and so forth) years of education.
In the computation of lambda, we noted the number of errors produced by always guessing the modal value. In the case of r, errors are measured in terms of the sum of the squared differences between the actual value and the mean. This sum is called the total variation.
To understand this concept, we must expand the scope of our examination. Let’s look at the logic of regression analysis and discuss to correlation within that context.
@GT-no indent:At several points in this text, I have referred to the general formula for describing the association between two variables: Y = f(X). This formula is read "Y is a function of X," meaning that values of Y can be explained in terms of variations in the values of X. Stated more strongly, we might say that X causes Y, so the value of X determines the value of Y. Regression analysis is a method of determining the specific function relating Y to X. There are several forms of regression analysis, depending on the complexity of the relationships being studied. Let’s begin with the simplest.
@H3:Linear Regression @GT:The regression model can be seen most clearly in the case of a linear regression analysis, where there is a perfect linear association between two variables. Figure 17-1 is a scattergram presenting in graphic form the values of X and Y as produced by a hypothetical study. It shows that for the four cases in our study, the values of X and Y are identical in each instance. The case with a value of 1 on X also has a value of 1 on Y, and so forth. The relationship between the two variables in this instance is described by the equation Y = X; this is called the regression equation. Because all four points lie on a straight line, we could superimpose that line over the points; this is the regression line.
**[Figure 17-1 about here; pickup from 8e p. 411]**
@FN:Figure 17-1 @FT:Simple Scattergram of Values of X and Y
@GT:The linear regression model has important descriptive uses. The regression line offers a graphic picture of the association between X and Y, and the regression equation is an efficient form for summarizing that association. The regression model has inferential value as well. To the extent that the regression equation correctly describes the general association between the two variables, it may be used to predict other sets of values. If, for example, we know that a new case has a value of 3.5 on X, we can predict the value of 3.5 on Y as well.
In practice, of course, studies are seldom limited to four cases, and the associations between variables are seldom as clear as the one presented in Figure 17-1.
A somewhat more realistic example is presented in Figure 17-2, representing a hypothetical relationship between population and crime rate in small- to medium-sized cities. Each dot in the scattergram is a city, and its placement reflects that city’s population and its crime rate. As was the case in our previous example, the values of Y (crime rates) generally correspond to those of X (populations), and as values of X increase, so do values of Y. However, the association is not nearly as clear as it is in Figure 17-1.
**[Figure 17-2 about here; pickup from 8e p. 412]**
@FN:Figure 17-2 @FT:A Scattergram of the Values of Two Variables with Regression Line Added (Hypothetical)
@GT:In Figure 17-2 we can’t superimpose a straight line that will pass through all the points in the scattergram. But we can draw an approximate line showing the best possible linear representation of the several points. I’ve drawn that line on the graph.
You may (or may not) recall from algebra that any straight line on a graph can be represented by an equation of the form Y = a + bX, where X and Y are values of the two variables. In this equation, a equals the value of Y when X is 0, and b represents the slope of the line. If we know the values of a and b, we can calculate an estimate of Y for every value of X.
We can now say more formally that regression analysis is a technique for establishing the regression equation representing the geometric line that comes closest to the distribution of points on a graph. The regression equation provides a mathematical description of the relationship between the variables, and it allows us to infer values of Y when we have values of X. Recalling Figure 17-2, we could estimate crime rates of cities if we knew their populations.
To improve your guessing, you construct a regression line, stated in the form of a regression equation that permits the estimation of values on one variable from values on the other. The general format for this equation is Y ’ = a + b(X), where a and b are computed values, X is a given value on one variable, and Y ’ is the estimated value on the other. **[Y ’ is Y prime, as in 4e p. 413]**The values of a and b are computed to minimize the differences between actual values of Y and the corresponding estimates (Y ’) based on the known value of X. The sum of squared differences between actual and estimated values of Y is called the unexplained variation because it represents errors that still exist even when estimates are based on known values of X.
The explained variation is the difference between the total variation and the unexplained variation. Dividing the explained variation by the total variation produces a measure of the proportionate reduction of error corresponding to the similar quantity in the computation of lambda. In the present case, this quantity is the correlation squared: r2. Thus, if r = .7, then r2 = .49, meaning that about half the variation has been explained. In practice, we compute r rather than r2, because the product-moment correlation can take either a positive or a negative sign, depending on the direction of the relationship between the two variables. (Computing r2 and taking a square root would always produce a positive quantity.) You can consult any standard statistics textbook for the method of computing r, although I anticipate that most readers using this measure will have access to computer programs designed for this function.
Unfortunately—or perhaps fortunately—social life is so complex that the simple linear regression model often does not sufficiently represent the state of affairs. It’s possible, using percentage tables, to analyze more than two variables. As the number of variables increases, such tables become increasingly complicated and hard to read. But the regression model offers a useful alternative in such cases.
@H3:Multiple Regression @GT:Very often, social researchers find that a given dependent variable is affected simultaneously by several independent variables. Multiple regression analysis provides a means of analyzing such situations. This was the case when Beverly Yerg (1981) set about studying teacher effectiveness in physical education. She stated her expectations in the form of a multiple regression equation:
**[Set as in 8e p. 413; see addition of comma and "where" after first equation]**
@EX:F = b0 + b1I + b2X1 + b3X2 + b4X3 + b5X4 + e, where
F = Final pupil-performance score
I = Initial pupil-performance score
X1 = Composite of guiding and supporting practice
X2 = Composite of teacher mastery of content
X3 = Composite of providing specific, task-related feedback
X4 = Composite of clear, concise task presentation
b = Regression weight
e = Residual
@EXS:(ADAPTED FROM YERG 1981:42)
@GT:Notice that in place of the single X variable in a linear regression, there are several X’s, and there are also several b’s instead of just one. Also, Yerg has chosen to represent a as b0 in this equation but with the same meaning as discussed previously. Finally, the equation ends with a residual factor (e), which represents the variance in Y that is not accounted for by the X variables analyzed.
Beginning with this equation, Yerg calculated the values of the several b’s to show the relative contributions of the several independent variables in determining final student-performance scores. She also calculated the multiple-correlation coefficient as an indicator of the extent to which all six variables predict the final scores. This follows the same logic as the simple bivariate correlation discussed earlier, and it is traditionally reported as a capital R. In this case, R = .877, meaning that 77 percent of the variance (.8772= .77) in final scores is explained by the six variables acting in concert.
@H3:Partial Regression @GT:In exploring the elaboration model in your textbook, we paid special attention to the relationship between two variables when a third test variable was held constant. Thus, we might examine the effect of education on prejudice with age held constant, testing the independent effect of education. To do so, we would compute the tabular relationship between education and prejudice separately for each age group.
Partial regression analysis is based on this same logical model. The equation summarizing the relationship between variables is computed on the basis of the test variables remaining constant. As in the case of the elaboration model, the result may then be compared with the uncontrolled relationship between the two variables to clarify further the overall relationship.
@H3:Curvilinear Regression @GT:Up to now, we have been discussing the association among variables as represented by a straight line. The regression model is even more general than our discussion thus far has implied.
You may already know that curvilinear functions, as well as linear ones, can be represented by equations. For example, the equation X2 + Y2 = 25 describes a circle with a radius of 5. Raising variables to powers greater than 1 has the effect of producing curves rather than straight lines. And in the real world there is no reason to assume that the relationship among every set of variables will be linear. In some cases, then, curvilinear regression analysis can provide a better understanding of empirical relationships than can any linear model.
Recall, however, that a regression line serves two functions. It describes a set of empirical observations, and it provides a general model for making inferences about the relationship between two variables in the general population that the observations represent. A very complex equation might produce an erratic line that would indeed pass through every individual point. In this sense, it would perfectly describe the empirical observations. There would be no guarantee, however, that such a line could adequately predict new observations or that it in any meaningful way represented the relationship between the two variables in general. Thus, it would have little or no inferential value.
Earlier in this book, we discussed the need for balancing detail and utility in data reduction. Ultimately, researchers attempt to provide the most faithful, yet also the simplest, representation of their data. This practice also applies to regression analysis. Data should be presented in the simplest fashion (thus, linear regressions are most frequently used) that best describes the actual data. Curvilinear regression analysis adds a new option to the researcher in this regard, but it does not solve the problems altogether. Nothing does that.
@H3:Cautions in Regression Analysis @GT:The use of regression analysis for statistical inferences is based on the same assumptions made for correlational analysis: simple random sampling, the absence of nonsampling errors, and continuous interval data. Because social scientific research seldom completely satisfies these assumptions, you should use caution in assessing the results in regression analyses.
Also, regression lines—linear or curvilinear—can be useful for interpolation (estimating cases lying between those observed), but they are less trustworthy when used for extrapolation (estimating cases that lie beyond the range of observations). This limitation on extrapolations is important in two ways. First, you are likely to come across regression equations that seem to make illogical predictions. An equation linking population and crimes, for example, might seem to suggest that small towns with, say, a population of 1,000 should produce 123 crimes a year. This failure in predictive ability does not disqualify the equation but dramatizes that its applicability is limited to a particular range of population sizes. Second, researchers sometimes overstep this limitation, drawing inferences that lie outside their range of observation, and you’d be right in criticizing them for that.
The preceding sections have introduced some of the techniques for measuring associations among variables at different levels of measurement. Matters become slightly more complex when the two variables represent different levels of measurement. Though we aren’t going to pursue this issue in this textbook, the box "Measures of Association and Levels of Measurement," by Peter Nardi, may be a useful resource if you ever have to address such situations.
**[Box "Measures of Association and Levels of Measurement" about here; pickup from 8e p. 415]**
@H1:OTHER MULTIVARIATE TECHNIQUES
@GT-no indent:For the most part, this book has focused on rather rudimentary forms of data manipulation, such as the use of contingency tables and percentages. The elaboration model of analysis was presented in this form, as were the statistical techniques presented so far in this discussion.
@GT:This section of the discussion presents a cook’s tour of three other multivariate techniques from the logical perspective of the elaboration model. This discussion isn’t intended to teach you how to use these techniques but rather to present sufficient information so that you can understand them if you run across them in a research report. The three methods of analysis that we’ll examine—path analysis, time-series analysis, and factor analysis—are only a few of the many multivariate techniques used by social scientists.
@GT-no indent:Path analysis is a causal model for understanding relationships between variables. Though based on regression analysis, it can provide a more useful graphic picture of relationships among several variables than other means can. Path analysis assumes that the values of one variable are caused by the values of another, so it is essential to distinguish independent and dependent variables. This requirement is not unique to path analysis, of course, but path analysis provides a unique way of displaying explanatory results for interpretation.
Recall for a moment one of the ways I represented the elaboration model in your textbook. Here’s how we might diagram the logic of interpretation: **[Set the following as in 8e p. 416; lc "v"]**
Independent variable Æ Intervening variable Æ Dependent variable
The logic of this presentation is that an independent variable has an impact on an intervening variable, which, in turn, has an impact on a dependent variable. The path analyst constructs similar patterns of relationships among variables, but the typical path diagram contains many more variables than shown in this diagram.
Besides diagramming a network of relationships among variables, path analysis also shows the strengths of those several relationships. The strengths of relationships are calculated from a regression analysis that produces numbers analogous to the partial relationships in the elaboration model. These path coefficients, as they are called, represent the strengths of the relationships between pairs of variables, with the effects of all other variables in the model held constant.
The analysis in Figure 17-3, for example, focuses on the religious causes of anti-Semitism among Christian church members. The variables in the diagram are, from left to right, (1) orthodoxy, or the extent to which the subjects accept conventional beliefs about God, Jesus, biblical miracles, and so forth; (2) particularism, the belief that one’s religion is the "only true faith"; (3) acceptance of the view that the Jews crucified Jesus; (4) religious hostility toward contemporary Jews, such as believing that God is punishing them or that they will suffer damnation unless they convert to Christianity; and (5) secular anti-Semitism, such as believing that Jews cheat in business, are disloyal to their country, and so forth.
**[Figure 17-3 about here; pickup from 8e p. 416]**
@FN:Figure 17-3 @FT:Diagramming the Religious Sources of Anti-Semitism
@GT:To start with, the researchers who conducted this analysis proposed that secular anti-Semitism was produced by moving through the five variables: Orthodoxy caused particularism, which caused the view of the historical Jews as crucifiers, which caused religious hostility toward contemporary Jews, which resulted, finally, in secular anti-Semitism.
The path diagram tells a different story. The researchers found, for example, that belief in the historical role of Jews as the crucifiers of Jesus doesn’t seem to matter in the process that generates anti-Semitism. And, although particularism is a part of one process resulting in secular anti-Semitism, the diagram also shows that anti-Semitism is created more directly by orthodoxy and religious hostility. Orthodoxy produces religious hostility even without particularism, and religious hostility generates secular hostility in any event.
One last comment on path analysis is in order. Although it is an excellent way of handling complex causal chains and networks of variables, path analysis itself does not tell the causal order of the variables. Nor was the path diagram generated by computer. The researcher decided the structure of relationships among the variables and used computer analysis merely to calculate the path coefficients that apply to such a structure.
@GT-no indent:The various forms of regression analysis are often used to examine time-series data, representing changes in one or more variables over time. As I’m sure you know, U.S. crime rates have generally increased over the years. A time-series analysis of crime rates could express the long-term trend in a regression format and provide a way of testing explanations for the trend—such as population growth or economic fluctuations—and could permit forecasting of future crime rates.
@GT:In a simple illustration, Figure 17-4 graphs the larceny rates of a hypothetical city over time. Each dot on the graph represents the number of larcenies reported to police during the year indicated.
**[Figure 17-4 about here; pickup from 8e p. 417]**
@FN:Figure 17-4 @FT:The Larceny Rates over Time in a Hypothetical City
@GT:Suppose we feel that larceny is partly a function of overpopulation. You might reason that crowding would lead to psychological stress and frustration, resulting in increased crimes of many sorts. Recalling the discussion of regression analysis, we could create a regression equation representing the relationship between larceny and population density—using the actual figures for each variable, with years as the units of analysis. Having created the best-fitting regression equation, we could then calculate a larceny rate for each year, based on that year’s population density rate. For the sake of simplicity, let’s assume that the city’s population size (and hence density) has been steadily increasing. This would lead us to predict a steadily increasing larceny rate as well. These regression estimates are represented by the dashed regression line in Figure 17-4.
Time-series relationships are often more complex than this simple illustration suggests. For one thing, there can be more than one causal variable. For example, we might find that unemployment rates also had a powerful impact on larceny. We might develop an equation to predict larceny on the basis of both of these causal variables. As a result, the predictions might not fall along a simple, straight line. Whereas population density was increasing steadily in the first model, unemployment rates rise and fall. As a consequence, our predictions of the larceny rate would similarly go up and down.
Pursuing the relationship between larceny and unemployment rates, we might reason that people do not begin stealing as soon as they become unemployed. Typically, they might first exhaust their savings, borrow from friends, and keep hoping for work. Larceny would be a last resort.
Time-lagged regression analysis could be used to address this more complex case. Thus, we might create a regression equation that predicted a given year’s larceny rate based, in part, on the previous year’s unemployment rate or perhaps on an average of the two years’ unemployment rates. The possibilities are endless.
If you think about it, a great many causal relationships are likely to involve a time lag. Historically, many of the world’s poor countries have maintained their populations by matching high death rates with equally high birthrates. It has been observed repeatedly, moreover, that when a society’s death rate is drastically reduced—through improved medical care, public sanitation, and improved agriculture, for example—that society’s birthrate drops sometime later on, but with an intervening period of rapid population growth. Or, to take a very different example, a crackdown on speeding on a state’s highways is likely to reduce the average speed of cars. Again, however, the causal relationship would undoubtedly involve a time lag—days, weeks, or months, perhaps—as motorists began to realize the seriousness of the crackdown.
In all such cases, the regression equations generated might take many forms. In any event, the criterion for judging success or failure is the extent to which the researcher can account for the actual values observed for the dependent variable.
@GT-no indent: Factor analysis is a unique approach to multivariate analysis. Its statistical basis is complex enough and different enough from the foregoing discussions to suggest a general discussion here.
@GT:Factor analysis is a complex algebraic method used to discover patterns among the variations in values of several variables. This is done essentially through the generation of artificial dimensions (factors) that correlate highly with several of the real variables and that are independent of one another. A computer must be used to perform this complex operation.
Let’s suppose that a data file contains several indicators of subjects’ prejudice. Each item should provide some indication of prejudice, but none will give a perfect indication. All of these items, moreover, should be highly intercorrelated empirically. In a factor analysis of the data, the researcher would create an artificial dimension that would be highly correlated with each of the items measuring prejudice. Each subject would essentially receive a value on that artificial dimension, and the value assigned would be a good predictor of the observed attributes on each item.
Suppose now that the same study provided several indicators of subjects’ mathematical ability. It’s likely that the factor analysis would also generate an artificial dimension highly correlated with each of those items.
The output of a factor analysis program consists of columns representing the several factors (artificial dimensions) generated from the observed relations among variables plus the correlations between each variable and each factor—called the factor loadings.
In the preceding example, it’s likely that one factor would more or less represent prejudice, and another would more or less represent mathematical ability. Data items measuring prejudice would have high loadings on (correlations with) the prejudice factor and low loadings on the mathematical ability factor. Data items measuring mathematical ability would have just the opposite pattern.
In practice, factor analysis does not proceed in this fashion. Rather, the variables are input to the program, and the program outputs a series of factors with appropriate factor loadings. The analyst must then determine the meaning of a given factor on the basis of those variables that load highly on it. The program’s generation of factors, however, has no reference to the meaning of variables, only to their empirical associations. Two criteria are taken into account: (1) a factor must explain a relatively large portion of the variance found in the study variables, and (2) every factor must be more or less independent of every other factor.
Here’s an example of the use of factor analysis. Many social researchers have studied the problem of delinquency. If you look deeply into the problem, however, you’ll discover that there are many different types of delinquents. In a survey of high school students in a small Wyoming town, Morris Forslund (1980) set out to create a typology of delinquency. His questionnaire asked students to report whether they had committed a variety of delinquent acts. He then submitted their responses to factor analysis. The results are shown in Table 17-6.
**[Table 17-6 about here; pickup from 8e p. 419]**
As you can see in this table, the various delinquent acts are listed on the left. The numbers shown in the body of the table are the factor loadings on the four factors constructed in the analysis. You’ll notice that after examining the dimensions, or factors, Forslund labeled them. I’ve bracketed the items on each factor that led to his choice of labels. Forslund summarized the results as follows:
@EX:For the total sample four fairly distinct patterns of delinquent acts are apparent. In order of variance explained, they have been labeled: 1) Property Offenses, including both vandalism and theft; 2) Incorrigibility; 3) Drugs/Truancy; and 4) Fighting. It is interesting, and perhaps surprising, to find both vandalism and theft appear together in the same factor. It would seem that those high school students who engage in property offenses tend to be involved in both vandalism and theft. It is also interesting to note that drugs, alcohol and truancy fall in the same factor.
@GT:Having determined this overall pattern, Forslund reran the factor analysis separately for boys and for girls. Essentially the same patterns emerged in both cases.
This example shows that factor analysis is an efficient method of discovering predominant patterns among a large number of variables. Instead of being forced to compare countless correlations—simple, partial, and multiple—to discover those patterns, researchers can use factor analysis for this task. Incidentally, this is a good example of a helpful use of computers.
Factor analysis also presents data in a form that can be interpreted by the reader or researcher. For a given factor, the reader can easily discover the variables loading highly on it, thus noting clusters of variables. Or, the reader can easily discover which factors a given variable is or is not loaded highly on.
But factor analysis also has disadvantages. First, as noted previously, factors are generated without any regard to substantive meaning. Often researchers will find factors producing very high loadings for a group of substantively disparate variables. They might find, for example, that prejudice and religiosity have high positive loadings on a given factor, with education having an equally high negative loading. Surely the three variables are highly correlated, but what does the factor represent in the real world? All too often, inexperienced researchers will be led into naming such factors as "religio-prejudicial lack of education" or something similarly nonsensical.
Second, factor analysis is often criticized on basic philosophical grounds. Recall that to be useful, a hypothesis must be disprovable. If the researcher cannot specify the conditions under which the hypothesis would be disproved, the hypothesis is in reality either a tautology or useless. In a sense, factor analysis suffers this defect. No matter what data are input, factor analysis produces a solution in the form of factors. Thus, if the researcher were asking, "Are there any patterns among these variables?" the answer always would be yes. This fact must also be taken into account in evaluating the results of factor analysis. The generation of factors by no means ensures meaning.
My personal view of factor analysis is the same as that for other complex modes of analysis. It can be an extremely useful tool for the social science researcher. Its use should be encouraged whenever such activity may assist researchers in understanding a body of data. As in all cases, however, such tools are only tools and never magical solutions.
Let me reiterate that the analytical techniques we’ve touched on are only a few of the many techniques commonly used by social scientists. As you pursue your studies, you may very well want to study this subject in more depth later.
@GT-no indent:Many, if not most, social scientific research projects involve the examination of data collected from a sample drawn from a larger population. A sample of people may be interviewed in a survey; a sample of divorce records may be coded and analyzed; a sample of newspapers may be examined through content analysis. Researchers seldom if ever study samples just to describe the samples per se; in most instances, their ultimate purpose is to make assertions about the larger population from which the sample has been selected. Frequently, then, you’ll wish to interpret your univariate and multivariate sample findings as the basis for inferences about some population.
@GT:This section examines inferential statistics—the statistical measures used for making inferences from findings based on sample observations to a larger population. We’ll begin with univariate data and move to multivariate.
@GT-no indent:Your textbook dealt with methods of presenting univariate data. Each summary measure was intended as a method of describing the sample studied. Now we’ll use such measures to make broader assertions about a population. This section addresses two univariate measures: percentages and means.
@GT:If 50 percent of a sample of people say they had colds during the past year, 50 percent is also our best estimate of the proportion of colds in the total population from which the sample was drawn. (This estimate assumes a simple random sample, of course.) It’s rather unlikely, however, that precisely 50 percent of the population had colds during the year. If a rigorous sampling design for random selection has been followed, however, we’ll be able to estimate the expected range of error when the sample finding is applied to the population.
Your textbook's discussion of sampling theory, covered the procedures for making such estimates, so I’ll only review them here. In the case of a percentage, the quantity
**[Set p times q, over n, all in a square root radical, as in 8e p. 421]**
where p is a proportion, q equals (1 ? p), and n is the sample size, is called the standard error. As noted in your textbook, this quantity is very important in the estimation of sampling error. We may be 68 percent confident that the population figure falls within plus or minus one standard error of the sample figure; we may be 95 percent confident that it falls within plus or minus two standard errors; and we may be 99.9 percent confident that it falls within plus or minus three standard errors.
Any statement of sampling error, then, must contain two essential components: the confidence level (for example, 95 percent) and the confidence interval (for example, 62.5 percent). If 50 percent of a sample of 1,600 people say they had colds during the year, we might say we’re 95 percent confident that the population figure is between 47.5 percent and 52.5 percent.
In this example we’ve moved beyond simply describing the sample into the realm of making estimates (inferences) about the larger population. In doing so, we must take care in several ways.
First, the sample must be drawn from the population about which inferences are being made. A sample taken from a telephone directory cannot legitimately be the basis for statistical inferences about the population of a city, but only about the population of telephone subscribers with listed numbers.
Second, the inferential statistics assume several things. To begin with, they assume simple random sampling, which is virtually never the case in sample surveys. The statistics also assume sampling with replacement, which is almost never done—but this is probably not a serious problem. Although systematic sampling is used more frequently than random sampling, it, too, probably presents no serious problem if done correctly. Stratified sampling, because it improves representativeness, clearly presents no problem. Cluster sampling does present a problem, however, because the estimates of sampling error may be too small. Quite clearly, street-corner sampling does not warrant the use of inferential statistics. Finally, this standard error sampling technique assumes a 100-percent completion rate—that is, that everyone in the sample completed the survey. This problem increases in seriousness as the completion rate decreases.
Third, inferential statistics are addressed to sampling error only, not nonsampling error such as coding errors or misunderstandings of questions by respondents. Thus, although we might state correctly that between 47.5 and 52.5 percent of the population (95 percent confidence) would report having colds during the previous year, we couldn’t so confidently guess the percentage who had actually had them. Because nonsampling errors are probably larger than sampling errors in a respectable sample design, we need to be especially cautious in generalizing from our sample findings to the population.
@H2:Tests of Statistical Significance
@GT-no indent:There is no scientific answer to the question of whether a given association between two variables is significant, strong, important, interesting, or worth reporting. Perhaps the ultimate test of significance rests with your ability to persuade your audience (present and future) of the association’s significance. At the same time, there is a body of inferential statistics to assist you in this regard called parametric tests of significance. As the name suggests, parametric statistics are those that make certain assumptions about the parameters describing the population from which the sample is selected. They allow us to determine the statistical significance of associations. "Statistical significance" does not imply "importance" or "significance" in any general sense. It refers simply to the likelihood that relationships observed in a sample could be attributed to sampling error alone.
@GT:Although tests of statistical significance are widely reported in social scientific literature, the logic underlying them is rather subtle and often misunderstood. Tests of significance are based on the same sampling logic discussed elsewhere in this book. To understand that logic, let’s return for a moment to the concept of sampling error in regard to univariate data.
Recall that a sample statistic normally provides the best single estimate of the corresponding population parameter, but the statistic and the parameter seldom correspond precisely. Thus, we report the probability that the parameter falls within a certain range (confidence interval). The degree of uncertainty within that range is due to normal sampling error. The corollary of such a statement is, of course, that it is improbable that the parameter would fall outside the specified range only as a result of sampling error. Thus, if we estimate that a parameter (99.9 percent confidence) lies between 45 percent and 55 percent, we say by implication that it is extremely improbable that the parameter is actually, say, 90 percent if our only error of estimation is due to normal sampling. This is the basic logic behind tests of statistical significance.
@H2:The Logic of Statistical Significance
@GT-no indent:I think I can illustrate the logic of statistical significance best in a series of diagrams representing the selection of samples from a population. Here are the elements in the logic:
@NL1:1. Assumptions regarding the independence of two variables in the population study
2. Assumptions regarding the representativeness of samples selected through conventional probability sampling procedures
3. The observed joint distribution of sample elements in terms of the two variables
@GT:Figure 17-5 represents a hypothetical population of 256 people; half are women, half are men. The diagram also indicates how each person feels about women enjoying equality to men. In the diagram, those favoring equality have open circles, those opposing it have their circles filled in.
**[Figure 17-5 about here; pickup from 8e p. 423]**
@FN:Figure 17-5 @FT:A Hypothetical Population of Men and Women Who Either Favor or Oppose Sexual Equality
@GT:The question we’ll be investigating is whether there is any relationship between gender and feelings about equality for men and women. More specifically, we’ll see if women are more likely to favor equality than are men, since women would presumably benefit more from it. Take a moment to look at Figure 17-5 and see what the answer to this question is.
The illustration in the figure indicates no relationship between gender and attitudes about equality. Exactly half of each group favors equality and half opposes it. Recall the earlier discussion of proportionate reduction of error. In this instance, knowing a person’s gender would not reduce the "errors" we’d make in guessing his or her attitude toward equality. The table at the bottom of Figure 17-5 provides a tabular view of what you can observe in the graphic diagram.
Figure 17-6 represents the selection of a one-fourth sample from the hypothetical population. In terms of the graphic illustration, a "square" selection from the center of the population provides a representative sample. Notice that our sample contains 16 of each type of person: Half are men and half are women; half of each gender favors equality, and the other half opposes it.
**[Figure 17-6 about here; pickup from 8e p. 424]**
@FN:Figure 17-6 @FT:A Representative Sample
@GT:The sample selected in Figure 17-6 would allow us to draw accurate conclusions about the relationship between gender and equality in the larger population. Following the sampling logic used in the textbook, we’d note there was no relationship between gender and equality in the sample; thus, we’d conclude there was similarly no relationship in the larger population—since we’ve presumably selected a sample in accord with the conventional rules of sampling.
Of course, real-life samples are seldom such perfect reflections of the populations from which they are drawn. It would not be unusual for us to have selected, say, one or two extra men who opposed equality and a couple of extra women who favored it—even if there was no relationship between the two variables in the population. Such minor variations are part and parcel of probability sampling.
Figure 17-7, however, represents a sample that falls far short of the mark in reflecting the larger population. Notice it includes far too many supportive women and opposing men. As the table shows, three-fourths of the women in the sample support equality, but only one-fourth of the men do so. If we had selected this sample from a population in which the two variables were unrelated to each other, we’d be sorely misled by our sample.
**[Figure 17-7 about here; pickup from 8e p. 425]**
@FN:Figure 17-7 @FT:An Unrepresentative Sample
@GT:As you’ll recall, it’s unlikely that a properly drawn probability sample would ever be as inaccurate as the one shown in Figure 17-7. In fact, if we actually selected a sample that gave us the results this one does, we’d look for a different explanation. Figure 17-8 illustrates the more likely situation.
**[Figure 17-8 about here; pickup from 8e p. 426]**
@FN:Figure 17-8 @FT:A Representative Sample from a Population in Which the Variables Are Related
@GT:Notice that the sample selected in Figure 17-8 also shows a strong relationship between gender and equality. The reason is quite different this time. We’ve selected a perfectly representative sample, but we see that there is actually a strong relationship between the two variables in the population at large. In this latest figure, women are more likely to support equality than are men: That’s the case in the population, and the sample reflects it.
In practice, of course, we never know what’s so for the total population; that’s why we select samples. So if we selected a sample and found the strong relationship presented in Figures 17-7 and 17-8, we’d need to decide whether that finding accurately reflected the population or was simply a product of sampling error.
The fundamental logic of tests of statistical significance, then, is this: Faced with any discrepancy between the assumed independence of variables in a population and the observed distribution of sample elements, we may explain that discrepancy in either of two ways: (1) we may attribute it to an unrepresentative sample, or (2) we may reject the assumption of independence. The logic and statistics associated with probability sampling methods offer guidance about the varying probabilities of varying degrees of unrepresentativeness (expressed as sampling error). Most simply put, there is a high probability of a small degree of unrepresentativeness and a low probability of a large degree of unrepresentativeness.
The statistical significance of a relationship observed in a set of sample data, then, is always expressed in terms of probabilities. "Significant at the .05 level (p £ .05)" simply means that the probability that a relationship as strong as the observed one can be attributed to sampling error alone is no more than 5 in 100. Put somewhat differently, if two variables are independent of one another in the population, and if 100 probability samples are selected from that population, no more than 5 of those samples should provide a relationship as strong as the one that has been observed.
There is, then, a corollary to confidence intervals in tests of significance, which represents the probability of the measured associations being due only to sampling error. This is called the level of significance. Like confidence intervals, levels of significance are derived from a logical model in which several samples are drawn from a given population. In the present case, we assume that there is no association between the variables in the population, and then we ask what proportion of the samples drawn from that population would produce associations at least as great as those measured in the empirical data. Three levels of significance are frequently used in research reports: .05, .01, and .001. These mean, respectively, that the chances of obtaining the measured association as a result of sampling error are 5/100, 1/100, and 1/1,000.
Researchers who use tests of significance normally follow one of two patterns. Some specify in advance the level of significance they’ll regard as sufficient. If any measured association is statistically significant at that level, they’ll regard it as representing a genuine association between the two variables. In other words, they’re willing to discount the possibility of its resulting from sampling error only.
Other researchers prefer to report the specific level of significance for each association, disregarding the conventions of .05, .01, and .001. Rather than reporting that a given association is significant at the .05 level, they might report significance at the .023 level, indicating the chances of its having resulted from sampling error as 23 out of 1,000.
@GT-no indent:Chi square (c2) is a frequently used test of significance in social science. It is based on the null hypothesis: the assumption that there is no relationship between the two variables in the total population. Given the observed distribution of values on the two separate variables, we compute the conjoint distribution that would be expected if there were no relationship between the two variables. The result of this operation is a set of expected frequencies for all the cells in the contingency table. We then compare this expected distribution with the distribution of cases actually found in the sample data, and we determine the probability that the discovered discrepancy could have resulted from sampling error alone. An example will illustrate this procedure.
Let’s assume we’re interested in the possible relationship between church attendance and gender for the members of a particular church. To test this relationship, we select a sample of 100 church members at random. We find that our sample is made up of 40 men and 60 women and that 70 percent of our sample say they attended church during the preceding week, whereas the remaining 30 percent say they did not.
If there is no relationship between gender and church attendance, then 70 percent of the men in the sample should have attended church during the preceding week, and 30 percent should have stayed away. Moreover, women should have attended in the same proportion. Table 17-7 (part I) shows that, based on this model, 28 men and 42 women would have attended church, with 12 men and 18 women not attending.
**[Table 17-7 about here; pickup from 8e p. 428]**
Part II of Table 17-7 presents the observed attendance for the hypothetical sample of 100 church members. Note that 20 of the men report having attended church during the preceding week, and the remaining 20 say they did not. Among the women in the sample, 50 attended church and 10 did not. Comparing the expected and observed frequencies (parts I and II), we note that somewhat fewer men attended church than expected, whereas somewhat more women attended than expected.
Chi square is computed as follows. For each cell in the tables, the researcher (1) subtracts the expected frequency for that cell from the observed frequency, (2) squares this quantity, and (3) divides the squared difference by the expected frequency. This procedure is carried out for each cell in the tables, and the several results are added together. (Part III of Table 17-7 presents the cell-by-cell computations.) The final sum is the value of chi square: 12.70 in the example.
This value is the overall discrepancy between the observed conjoint distribution in the sample and the distribution we would expect if the two variables were unrelated to each other. Of course, the mere discovery of a discrepancy does not prove that the two variables are related, since normal sampling error might produce discrepancies even when there is no relationship in the total population. The magnitude of the value of chi square, however, permits us to estimate the probability of that having happened.
@H3:Degrees of Freedom @GT:To determine the statistical significance of the observed relationship, we must use a standard set of chi square values. This will require the computation of the degrees of freedom, which refers to the possibilities for variation within a statistical model. Suppose I challenge you to find three numbers whose mean is 11. There is an infinite number of solutions to this problem: (11, 11, 11), (10, 11, 12), (?11, 11, 33), etc. Now, suppose I require that one of the numbers be 7. There would still be an infinite number of possibilities for the other two numbers.
If I told you one number had to be 7 and another 10, there would be only one possible value for the third. If the average of three numbers is 11, their sum must be 33. If two of the numbers total 17, the third must be 16. In this situation, we say there are two degrees of freedom. Two of the numbers could have any values we choose, but once they are specified, the third number is determined.
More generally, whenever we are examining the mean of N values, we can see that the degrees of freedom is N ? 1. Thus in the case of the mean of 23 values, we could make 22 of them anything we liked, but the 23rd would then be determined.
A similar logic applies to bivariate tables, such as those analyzed by chi square. Consider a table reporting the relationship between two dichotomous variables: gender (men/women) and abortion attitude (approve/disapprove). Notice that the table provides the marginal frequencies of both variables.
@T-1:Abortion Attitude Men Women Total
Total 500 500 1,000
@GT:Despite the conveniently round numbers in this hypothetical example, notice that there are numerous possibilities for the cell frequencies. For example, it could be the case that all 500 men approve and all 500 women disapprove, or it could be just the reverse. Or there could be 250 cases in each cell. Notice there are numerous other possibilities.
Now the question is, How many cells could we fill in pretty much as we choose before the remainder are determined by the marginal frequencies? The answer is only one. If we know that 300 men approved, for example, then 200 men would have had to disapprove, and the distribution would need to be just the opposite for the women.
In this instance, then, we say the table has one degree of freedom. Now, take a few minutes to construct a three-by-three table. Assume you know the marginal frequencies for each variable, and see if you can determine how many degrees of freedom it has.
For chi square, the degrees of freedom are computed as follows: the number of rows in the table of observed frequencies, minus 1, is multiplied by the number of columns, minus 1. This may be written as (r ? 1)(c ? 1). For a three-by-three table, then, there are four degrees of freedom: (3 ? 1)(3 ? 1) = (2)(2) = 4.
In the example of gender and church attendance, we have two rows and two columns (discounting the totals), so there is one degree of freedom. Turning to a table of chi square values (see Appendix F), we find that for one degree of freedom and random sampling from a population in which there is no relationship between two variables, 10 percent of the time we should expect a chi square of at least 2.7. Thus, if we selected 100 samples from such a population, we should expect about 10 of those samples to produce chi squares equal to or greater than 2.7. Moreover, we should expect chi square values of at least 6.6 in only 1 percent of the samples and chi square values of 7.9 in only half a percent (.005) of the samples. The higher the chi square value, the less probable it is that the value could be attributed to sampling error alone.
In our example, the computed value of chi square is 12.70. If there were no relationship between gender and church attendance in the church member population and a large number of samples had been selected and studied, then we would expect a chi square of this magnitude in fewer than 1/10 of 1 percent (.001) of those samples. Thus, the probability of obtaining a chi square of this magnitude is less than .001, if random sampling has been used and there is no relationship in the population. We report this finding by saying the relationship is statistically significant at the .001 level. Because it is so improbable that the observed relationship could have resulted from sampling error alone, we’re likely to reject the null hypothesis and assume that there is a relationship between the two variables in the population of church members.
Most measures of association can be tested for statistical significance in a similar manner. Standard tables of values permit us to determine whether a given association is statistically significant and at what level. Any standard statistics textbook provides instructions on the use of such tables.
@H3:Some Words of Caution @GT:Tests of significance provide an objective yardstick that we can use to estimate the statistical significance of associations between variables. They help us rule out associations that may not represent genuine relationships in the population under study. However, the researcher who uses or reads reports of significance tests should remain wary of several dangers in their interpretation.
First, we have been discussing tests of statistical significance; there are no objective tests of substantive significance. Thus, we may be legitimately convinced that a given association is not due to sampling error, but we may be in the position of asserting without fear of contradiction that two variables are only slightly related to each other. Recall that sampling error is an inverse function of sample size—the larger the sample, the smaller the expected error. Thus, a correlation of, say, .1 might very well be significant (at a given level) if discovered in a large sample, whereas the same correlation between the same two variables would not be significant if found in a smaller sample. This makes perfectly good sense given the basic logic of tests of significance: In the larger sample, there is less chance that the correlation could be simply the product of sampling error. In both samples, however, it might represent an essentially zero correlation.
The distinction between statistical and substantive significance is perhaps best illustrated by those cases where there is absolute certainty that observed differences cannot be a result of sampling error. This would be the case when we observe an entire population. Suppose we were able to learn the ages of every public official in the United States and of every public official in Russia. For argument’s sake, let’s assume further that the average age of U.S. officials was 45 years old compared with, say, 46 for the Russian officials. Because we would have the ages of all officials, there would be no question of sampling error. We would know with certainty that the Russian officials were older than their U.S. counterparts. At the same time, we would say that the difference was of no substantive significance. We’d conclude, in fact, that they were essentially the same age.
Second, lest you be misled by this hypothetical example, realize that statistical significance should not be calculated on relationships observed in data collected from whole populations. Remember, tests of statistical significance measure the likelihood of relationships between variables being only a product of sampling error; if there’s no sampling, there’s no sampling error.
Third, tests of significance are based on the same sampling assumptions we used in computing confidence intervals. To the extent that these assumptions are not met by the actual sampling design, the tests of significance are not strictly legitimate.
While we have examined statistical significance here in the form of chi square, there are several other measures commonly used by social scientists. Analysis of variance and t-tests are two examples you may run across in your studies.
As is the case for most matters covered in this book, I have a personal prejudice. In this instance, it is against tests of significance. I don’t object to the statistical logic of those tests, because the logic is sound. Rather, I’m concerned that such tests seem to mislead more than they enlighten. My principal reservations are the following:
@NL1:1. Tests of significance make sampling assumptions that are virtually never satisfied by actual sampling designs.
2. They depend on the absence of nonsampling errors, a questionable assumption in most actual empirical measurements.
3. In practice, they are too often applied to measures of association that have been computed in violation of the assumptions made by those measures (for example, product-moment correlations computed from ordinal data).
4. Statistical significance is too easily misinterpreted as "strength of association," or substantive significance.
@GT:These concerns are underscored by a recent study (Sterling, Rosenbaum, and Weinkam 1995) examining the publication policies of nine psychology and three medical journals. As the researchers discovered, the journals were quite unlikely to publish articles that did not report statistically significant correlations among variables. They quote the following from a rejection letter:
@EX:Unfortunately, we are not able to publish this manuscript. The manuscript is very well written and the study was well documented. Unfortunately, the negative results translates into a minimal contribution to the field. We encourage you to continue your work in this area and we will be glad to consider additional manuscripts that you may prepare in the future.
@EXS:(STERLING ET AL. 1995:109)
@GT:Let’s suppose a researcher conducts a scientifically excellent study to determine whether X causes Y. The results indicate no statistically significant correlation. That’s good to know. If we’re interested in what causes cancer, war, or juvenile delinquency, it’s good to know that a possible cause actually does not cause it. That knowledge would free researchers to look elsewhere for causes.
As we’ve seen, however, such a study might very well be rejected by journals. As such, other researchers would continue testing whether X causes Y, not knowing that previous studies found no causal relationship. This would produce many wasted studies, none of which would see publication and draw a close to the analysis of X as a cause of Y.
From what you’ve learned about probabilities, however, you can understand that if enough studies are conducted, one will eventually measure a statistically significant correlation between X and Y. If there is absolutely no relationship between the two variables, we would expect a correlation significant at the .05 level five times out of a hundred, since that’s what the .05 level of significance means. If a hundred studies were conducted, therefore, we could expect five to suggest a causal relationship where there was actually none—and those five studies would be published!
There are, then, serious problems inherent in too much reliance on tests of statistical significance. At the same time (perhaps paradoxically) I would suggest that tests of significance can be a valuable asset to the researcher—useful tools for understanding data. Although many of my comments suggest an extremely conservative approach to tests of significance—that you should use them only when all assumptions are met—my general perspective is just the reverse.
I encourage you to use any statistical technique—any measure of association or test of significance—if it will help you understand your data. If the computation of product-moment correlations among nominal variables and the testing of statistical significance in the context of uncontrolled sampling will meet this criterion, then I encourage such activities. I say this in the spirit of what Hanan Selvin, another pioneer in developing the elaboration model, referred to as "data-dredging techniques." Anything goes, if it leads ultimately to the understanding of data and of the social world under study.
The price of this radical freedom, however, is the giving up of strict, statistical interpretations. You will not be able to base the ultimate importance of your finding solely on a significant correlation at the .05 level. Whatever the avenue of discovery, empirical data must ultimately be presented in a legitimate manner, and their importance must be argued logically.
@BL:o Descriptive statistics are used to summarize data under study. Some descriptive statistics summarize the distribution of attributes on a single variable; others summarize the associations between variables.
o Descriptive statistics summarizing the relationships between variables are called measures of association.
o Many measures of association are based on a proportionate reduction of error (PRE) model. This model is based on a comparison of (1) the number of errors we would make in attempting to guess the attributes of a given variable for each of the cases under study—if we knew nothing but the distribution of attributes on that variable—and (2) the number of errors we would make if we knew the joint distribution overall and were told for each case the attribute of one variable each time we were asked to guess the attribute of the other. These measures include lambda (l), which is appropriate for the analysis of two nominal variables; gamma (g), which is appropriate for the analysis of two ordinal variables; and Pearson’s product-moment correlation (r), which is appropriate for the analysis of two interval or ratio variables.
o Regression analysis represents the relationships between variables in the form of equations, which can be used to predict the values of a dependent variable on the basis of values of one or more independent variables.
o Regression equations are computed on the basis of a regression line: that geometric line representing, with the least amount of discrepancy, the actual location of points in a scattergram.
o Types of regression analysis include linear regression analysis, multiple regression analysis, partial regression analysis, and curvilinear regression analysis.
o Other multivariate techniques include time-series analysis, the study of processes occurring over time; path analysis, a method of presenting graphically the networks of causal relationships among several variables; and factor analysis, a method of discovering the general dimensions represented by a collection of actual variables.
o Inferential statistics are used to estimate the generalizability of findings arrived at through the analysis of a sample to the larger population from which the sample has been selected. Some inferential statistics estimate the single-variable characteristics of the population; others—tests of statistical significance—estimate the relationships between variables in the population.
o Inferences about some characteristic of a population must indicate a confidence interval and a confidence level. Computations of confidence levels and intervals are based on probability theory and assume that conventional probability sampling techniques have been employed in the study.
o Inferences about the generalizability to a population of the associations discovered between variables in a sample involve tests of statistical significance, which estimate the likelihood that an association as large as the observed one could result from normal sampling error if no such association exists between the variables in the larger population. Tests of statistical significance are also based on probability theory and assume that conventional probability sampling techniques have been employed in the study.
o A frequently used test of statistical significance in social science is chi square.
o The level of significance of an observed association is reported in the form of the probability that the association could have been produced merely by sampling error. To say that an association is significant at the .05 level is to say that an association as large as the observed one could not be expected to result from sampling error more than 5 times out of 100.
o Social researchers tend to use a particular set of levels of significance in connection with tests of statistical significance: .05, .01, and .001. This is merely a convention, however.
o Statistical significance must not be confused with substantive significance, the latter meaning that an observed association is strong, important, meaningful, or worth writing home to your mother about.
o Tests of statistical significance, strictly speaking, make assumptions about data and methods that are almost never satisfied completely by real social research. Despite this, the tests can serve a useful function in the analysis and interpretation of data.
proportionate reduction of error (PRE)
linear regression analysis
multiple regression analysis
partial regression analysis
curvilinear regression analysis
tests of statistical significance
level of significance
@H1:REVIEW QUESTIONS AND EXERCISES
@NL1:1. In your own words, explain the logic of proportionate reduction of error (PRE) measures of associations.
2. In your own words, explain the purpose of regression analyses.
3. In your own words, distinguish between measures of association and tests of statistical significance.
4. Find a study that reports the statistical significance of its findings and critique the clarity with which it is reported.
5. Locate a study that uses factor analysis and summarize the findings.
@UL:Babbie, Earl, Fred Halley, and Jeanne Zaino. 2000. Adventures in Social Research. Newbury Park, CA: Pine Forge Press. This book introduces the analysis of social research data through SPSS for Windows. Several of the basic statistical techniques used by social researchers are discussed and illustrated.
Blalock, Hubert M., Jr. 1979. Social Statistics. New York: McGraw-Hill. Blalock’s textbook has been a standard for social science students (and faculty) for decades. Tad Blalock’s death was a loss to all social science.
Frankfort-Nachmias, Chava. 1997. Social Statistics for a Diverse Society. Newbury Park, CA: Pine Forge Press. A comprehensive textbook on social statistics that makes particularly good use of graphics in presenting the logic of the many statistics commonly used by social scientists.
Healey, Joseph F. 1999. Statistics: A Tool for Social Research. Belmont, CA: Wadsworth. An effective introduction to social statistics.
Mohr, Lawrence B. 1990. Understanding Significance Testing. Newbury Park, CA: Sage. An excellent and comprehensive examination of the topic: both the technical details of testing statistical significance and the meaning of such tests.
@H1:Sociology Web Site
@GT-no indent:See the Wadsworth Sociology Resource Center, Virtual Society, for additional links, Internet exercises by chapter, quizzes by chapter, and Microcase-related materials:
@H1:InfoTrac College Edition
@H2:Search Word Summary
@GT-no indent:Go to the Wadsworth Sociology Resource Center, Virtual Society, to find a list of search words for each chapter. Using the search words, go to InfoTrac College Edition, an online library of over 900 journals where you can do online research and find readings related to your studies. To aid in your search and to gain useful tips, see the Student Guide to InfoTrac College Edition on the Virtual Society Web site: