eBabbie
Resource Center
SPSS Learner's Guide
SPSS
11.0 for Windows
Getting Started
Opening a Data File
Saving Changes
Getting Around with SPSS Windows
Frequency Distributions
Cross-Tabulations
Recoding Variables
Multivariate Tables
Tests of Statistical Significance
Correlation and Regression
Creating Indexes
Graphics
Making Copies of Results
for a Paper
Shutting Down
This guide will provide you with a brief overview of
the Statistical Package for the Social SciencesTM (SPSS). For many years,
SPSS has been the most commonly used program for quantitative data analysis
in the social sciences. It has gone through many versions for both the Windows
and Macintosh platforms. This guide will use SPSS 11.0 along with data from
the 2000 General Social Survey. If you’re using a different version of SPSS
or a different data set, you’ll need to make some adjustments, but this guide
nonetheless introduces you to the overall logic and application of SPSS. Whatever
version you have, consult the user manual for whatever additional assistance
you need.
In addition, several books have recently been written
to introduce social researchers to SPSS. One is Earl Babbie, Fred Halley,
and Jeanne Zaino, Adventures in Social Research, Thousand Oaks, CA:
Pine Forge Press, 2000.
Getting Started
When you first open SPSS 11.0 for Windows on your computer,
Figure 1 below will appear. You have several options to start with.
You can click on “Run tutorial” and click OK. This is a good step for
you to get familiar with some of the basic features of SPSS. You also
have the option to create your own data. You would choose this option
if you had collected your own survey and were ready to transpose your respondents’
answers into an SPSS data spreadsheet. For you who are using the GSS
data, you will need to ask SPSS to import a data file already formatted
and ready for analysis. GSS files will often come as SPSS files (they
are recognizable by their .sav extension) or as SPSS portable files (saved
as .por files).
Figure 1. Opening SPSS
The next step is to load some data into the program.
Opening
a Data File
When “open an existing data source” is selected click
OK. You will have to browse your computer and select the location
in which your GSS file is saved. Notice the bottom of this window.
You can opt for not using this dialog window in the future. When you
start SPSS, the window in the background will appear and you can simply
click on “open” and select the GSS data file you want to analyze.
The advantage of this dialogue box is that after your first opening of the
GSS file, SPSS will list your file in place where “More files…” is highlighted.
In other words, if you need to work on this file again you can simply select
it with you mouse and click OK. SPSS will open your file in one step
only.
Let me guide you through the process of opening the
GSS file included in your textbook package. Simply close the window
in Figure 1. Look at the top row in the figure below. It contains several
menus: File, Edit, View, Data, Transform, Analyze, Graphs, Utilities, Windows,
and Help. We’ll use these menus throughout this guide.
Notice that the first letter of each menu name is underlined
(e.g., File). This means you can activate that menu by holding down the
ALT (control) key and typing the underlined letter. Thus if you press ALT+F,
you activate the File menu. You can accomplish the same thing by simply clicking
the menu name. In either fashion, you would end up with the screen shown
in Figure 2.
Figure 2. Opening the File Menu
As you can see, the File menu offers several possible
actions, but right now we’re only interested in opening a file. Click Open
to indicate that you want to load a data file into the program. As you may
recognize from the notation to the right of the command, we could have accomplished
the same thing by striking CTRL+O without even going into the File menu.
Next you’ll see a dialogue box asking you to select
among several options. Open the “data” menu. Your steps in browsing
your computer to open the data file is illustrated in Figures 3 and 4 below.
Figure 3. Picking a Data Set
Select the data set you want by double-clicking
it or by single-clicking it and then clicking Open. We’ve selected the 2000
General Social Survey (GSS) data set (located on the zip drive under the
folder “2000”), containing hundreds of variables collected from 2,817 respondents.
The research was conducted by the National Opinion Research Center at the
University of Chicago to establish a representative sample of U.S. residents
18 and older. You, of course, may be working with a different data set,
such as the one that came with this book. In your case, open the CD
drive to open your data file.
Figure 4. Changing the File Type
It is important to note that SPSS is set to open .sav
files by default. In my case, the GSS 2000 data file was created as
a .por file. I had to change to change the file type to “SPSS Portable
(*.por)” as illustrated in Figure 4 above. When my GSS200.por file
appeared I made sure it was highlighted and clicked “open.” See Figure
5 below.
Figure 5. Choosing a SPSS Portable Data Set
Saving Changes
When you modify your data set—creating recoded variables,
for example—you’ll probably want to save those changes for later use. Realize
that any such changes will stay in effect throughout this SPSS session,
but when you exit the program (see the last section of these SPSS Guidelines)
you can lose all your changes. It’s wise to save changes as soon as you’re
sure you want to.
Saving an altered file is hardly rocket science. First,
select the Data View window or the Variable View window (not the Output
window which contains all statistical jobs you asked SPSS to perform on
your data). Then, under the File menu, select Save. Alternatively, you can
simply press CTRL+S. From now on, when you open this file, it will contain
your alterations.
Realize that when you save the file in this fashion,
the changed file replaces the original one. So if you madly deleted data or
altered variables using their original names, you’ll have put the original
file forever out of reach.
If you wish to save the original file as well as your
changes, choose Save As under the File menu (see Figure 6). This time, SPSS
will ask you to supply a name for the data set about to be saved. Use some
name other than that of the original data set. Also, pay attention to where
on your disk it is saved so you can find it later on.
Figure 6. Saving a SPSS portable Data Set
You may also want to save your data file as an SPSS
file if it is a portable file or other type of data spreadsheet. In Figure
7, I saved my GSS2000.por as a GSS2000.sav which will save me some time in
processing this file in the future. There won’t be a need to wait
for conversion time.
Figure 7. Saving as a .sas file
Getting Around with SPSS Windows
By default, the window that appears is the “Data Editor”
window. No matter what data file you open, the setting of this spreadsheet
is always the same. Each of the columns represents a variable, such as the
respondent’s gender, age, or attitude about abortion. Each row represents
a particular respondent. Thus each cell of the matrix stores some item of
information about a person. In Figure 2, all the cells are empty.
Once you’ve instructed SPSS to open your data set, the
original matrix will be filled with data, the way it is in Figure 8 below.
Notice that the row just above the matrix now contains the names of the
variables comprising the data set: hrs1, wrkgovt, and so forth. SPSS uses
abbreviated labels, each no more than eight characters long.
Figure 8. A Full Data Matrix
Notice in Figure 8 that the upper-right cell,
which is the cell that link together case number one (respondent number
1) and the variable labeled wrkgovt. The first respondent has a value
of 2 on the variable wrkgovt. But what on earth does that mean?
This is where SPSS 11.0 is different from earlier versions.
In addition to the “Data View” sub-window, the Data Editor window has a Variable
View sub-window. You can simply click on “Variable View” at the bottom
of the Data Editor window. In Figure 9 below, you can see how this
window is organized. This time, the rows represent the variables and
the columns are the various categorizations associated with each variable.
The columns are as follows: Name, Type, Width, Decimals, Label, Values, Missing,
Columns, Align, and Measure. The Name is the abbreviated name of the
variable (it is always no longer than 8 characters). The Type of the
variable is often “numeric” but could be “string” if you wanted to input
words as data instead of numbers. Label is the description of the variable
and indicates more clearly what the question on the questionnaire was about.
In the column “Value” you can find the values associated with each possible
answer for each variable. Notice also that you can increase the width
of any of these columns on the variable view of the data editor. This
feature is particularly useful if you want to read the variable label in
its totality.
Figure 9. A Full Variable Matrix
You can find the meaning of a particular variable label
in several ways. First, and easiest, by finding the variable name on the
variable view. If you were on the data view, you can double-click the variable
name in the column heading and the variable view will open automatically
and highlight the row of the variable you just double-clicked on. Here’s
another way to learn about variables in the data set. Go to the Utilities
menu above the data matrix and select the first option, Variables.
See Figure 10 below for an illustration. A list of all the variables
and the way they were formatted will appear and you can simply select the
variable of your choice from this window by clicking on it. Here I
selected the divorce variable.
Figure 10. Decoding the Variable Divorce
Variables are listed in the order they were imported
from the GSS site. However, if you want to see them listed alphabetically,
open the Edit menu and select the Option submenu. Then as in Figure
11 below, select Alphabetical from the Variable List option in the General
sub-window. Then click OK twice. Note: The list in the left column
may consist of the variable names instead of the abbreviated labels, but
you can change this easily. In the Edit menu, select Options. In the General
tag, find the section on Variable Lists in the right-hand column. Click Labels.
You’ll have to reload the data set, but it will be worth the effort, because
you’ll be able to track down the abbreviated name you’re looking for.
Figure 11. Sorting the Variable List
Now reopen the variable information window from the
Utilities menu. All variables are listed alphabetically. Notice
the words next to Variable Label: “EVER BEEN DIVORCED OR SEPARATED.” While
this is still abbreviated, you may figure out that it represents whether
this person has been divorced or separated. You can also view the value labels
instead of the numeric value they have been coded as. See Figure 12
below. Simply make sure you are on the data view, select View from
the menu and click on Value Label. A check mark will appear next to
this menu selection and you will be able to read directly what your respondents’
answers were for each variable. For instance, we now can see that
respondent number 1623 is female, 50 years old, and married to a man who
is 51 years old. To turn it off, by the way, simply open View and click
Value Labels again. Notice that the check mark indicates whether the feature
is on or off.
Figure 12. Viewing Variable Value Labels
In Figure 13 you can see the numeric values “1” and
“2” for divorce replaced by “Yes” and “No” respectively.
Figure 13. Viewing Variable Value Labels on Data View
You can obtain the full wording most easily from the
GSS Web site. The codebook index (by variable name) is located at http://www.icpsr.umich.edu/GSS.
Click on “Mnemonic.” That will take you to a list of variables beginning
with a. You can see any other variable by simply clicking on the first
letter of the variable you are looking for.
Let’s take a close look at the variable abany, which
we will use in our analysis further on. Now you can see more clearly what
this variable represents. Respondents were asked a battery of questions concerning
their attitudes toward abortion—specifically, the conditions (e.g., rape,
danger of birth defects) under which they felt a woman should be able to
obtain one legally. In this case, they were asked if they would support a
woman’s right to a legal abortion as a purely personal choice: “for any reason.”
Besides presenting the actual wording of the question,
this Web page also reports the answer categories and the results of several
surveys that asked the question over the years. Notice that a 1 “punch”
stands for saying “yes.” Now we know that the first person in the data set
feels that a woman should be able to choose an abortion for any reason.
Let’s go back to variable abany. To find out what
“0” means, let’s learn how to examine variable codes within SPSS.
Double-click abany in the column heading. The variable view window
opens and the variable divorce is automatically highlighted. Now click
on the little square in the value label cell for abany as shown in Figure
14 below.
Figure 14. Viewing Decoding Value Labels on Variable View
As you can see in Figure 15, “0” stands for “NAP,” which
means “not applicable.” In other words, this particular question was not
asked of some respondents.
Figure 15. Code Labels for Abany
Another method to learn about the value labels of a
particular variable is to select Variables from the Utilities menu.
A list of variables appear alphabetically as you can see in Figure 16 below.
Figure 16 presents the result of this action. First you’ll see that
we have pretty much the same information we obtained before. Notice the column
to the left, however. It’s the beginning of a list of all the variables in
the data set. (You can use the scroll bar to see the rest of the list.) Find
the name of a variable you’re interested in and click it. You’ll instantly
get the variable and value labels.
Again you can view all value labels for abany by simply
selecting this variable. The advantage of this subcomand is that you
also have the option to find the variable abany quickly on your SPSS data
editor by clicking on Go To. The abany column will be selected and
highlighted instantly on your data view window.
Figure 16. Viewing Decoding Value Labels for Abany
Person 3, for example, was not asked this question.
By asking different sets of questions of different people in the sample,
the researchers can collect data for hundreds of variables without driving
any of the respondents to suicide or homocide.
Frequency
Distributions
Now that we’ve seen what the abbreviated variable labels
and numerical code categories stand for, we’re ready to examine some public
opinion. Think about the question we’ve looked at so far. How do you suppose
people in the United States feel about a woman’s right to an abortion? That
is to say, what percentage do you suppose said “yes” and what percentage
said “no”? To start finding out, select Frequencies... from the Descriptive
Statistics menu in the Analyze general menu(see Figure 17).
Figure 17. Getting Frequency Distributions
This command will get you a list of variables to choose from, as illustrated
in Figure 18.
Figure 18. Choosing a Variable for Frequencies
Now you can double-click a variable label or else single-click
it and then click the right-pointing triangular arrow. Either of these actions
will move labels from the left-hand to the right-hand column. Figure 19
shows the results of three variables being selected this way.
Figure 19. Selecting Frequency Variables
Next we click the OK button. SPSS will zick and whirr
as it determines the distributions of responses to each of the three variables
in our example. It will then produce the frequency tables shown in Figure
20.
Figure 20. Frequency Distribution Tables
Notice that SPSS has now opened a new window. The first
was a data window and the new one is labeled Output. As you continue with
SPSS, you’ll often work back and forth between these two windows; often
the program alternates them automatically.
The left-hand frame in Figure 20 presents an outline
of the results. Click on any item in the outline to fill the right-hand frame
with the data you’ve requested. Here is where you can easily cut and
paste from SPSS to a MS Word document. Click on the outline Frequency
Table in the left-hand-side. All the frequency tables are now selected.
Select Copy objects from the Edit menu. Now open your Word document
and select Paste from the Edit menu. Figures 21 and 22 illustrate
this process. In this case I was only interested in copying and pasting
the frequency table for abany.
Figure 21. Selecting Frequency Distribution Tables
Figure 22. Copying Frequency Distribution Tables
In Figure 20 the right-hand side presents two tables.
The first one summarizes the three variables we chose originally. All we
are told here, however, is the number of respondents with valid responses
and those without. The second table gives the distribution of data for the
question of whether a woman should have the right to an abortion for any
reason. In addition to “yes” and “no,” the table reports three other possibilities:
NAP: “Not applicable” (the question
was not asked)
DK: Respondents who said they “Don’t
know”
NA: Respondents who were asked but
gave “No answer”
In the second table’s Frequency column you can learn
how many respondents fall under each of the categories. The Percent column
puts the information into a more useful form by showing the percentage represented
by each category. The most useful column is Valid Percent. This column tells
us that of the 1,768 respondents who gave a valid response, 39.9 percent
said “yes” and 60.1 percent said “no.” We might interpret these results by
saying that opinions on this issue are almost evenly divided.
By scrolling down the window or using the outline in
the left-hand frame, you can check the results for the other variables. For
now let’s move along to more complex analyses.
Cross-Tabulations
The frequency distributions we’ve just undertaken are
called univariate analyses (analyzing one variable at a time). Now we’ll
turn to bivariate analyses (two variables at a time).
Let’s stay with the issue of “abortion for any reason.”
We’ve seen that U.S. residents are about evenly divided on the issue. What
do you suppose accounts for this difference? People often guess that women
would be more likely than men to support abortion as a woman’s right. Let’s
see how to determine the accuracy of that guess.
Return to the Descriptive Statistics menu, but check
Crosstabs this time. This brings you a somewhat different dialog box, as indicated
in Figure 23.
Figure 23. Crosstabs Dialog Box
We are now going to set up a percentage table involving
two variables: abany and sex. The table will have both columns and rows.
While there are many ways to construct such a table, we’re going to assign
the categories of sex (male and female) to the columns. Then we’ll look at
the opinions on abany within each of those categories. In the logic and language
of SPSS, that makes abany the “row variable” and sex the “column variable.”
To assign categories, select variable labels from the list and drag them
to the appropriate windows on the right. Figure 24 shows this step.
Figure 24. Selecting Variables for the Crosstab
To find a variable label in the list, you can either
scroll through the list or click any label in the list and then type the variable
you want. It may take a little experimentation to discover how quickly you
must type to have it work.
Thus far, we’ve told SPSS to organize the table like
this:
Men Women
Approve
Disapprove
To complete our request, we have to tell SPSS how to
percentage the data. In this case, we’ll ask for the percentage of men who
approve of abortion and the percentage who disapprove, with the two percentages
totaling 100 percent. Then we’ll ask for the corresponding percentages of
women. In other words, ask SPSS to “percentage down” the columns. The Crosstab
option provides a means for us to indicate that preference. Click the Cells
button in the dialog box (See Figure 25).
Figure 25. Specifying the Percentaging Method
When the dialog box opens, the Observed box will already
be checked. Leave it that way. In the section on Percentages, click the Column
box. That instructs SPSS to percentage down the columns. Click Continue to
complete this dialog, and then click OK to launch the request for a crosstab.
Once SPSS has completed the table, we’ll be returned to the Output window,
as in Figure 26.
Figure 26. Crosstab of abany and sex
Let’s see what the table tells us. We wanted to find
out if men and women differed in their attitudes about whether a woman should
be able to choose an abortion just because she wanted one. The table suggests
that there’s no appreciable difference. The same proportion of men (39.9%)
and women (39.8%) say a woman should have the right to an abortion for any
reason.
Let’s try another variable that could affect people’s
attitudes toward abortion: political orientation. In the GSS, polviews represents
a standard item that asks respondents to characterize their political views
as something between “Extremely Liberal” and “Extremely Conservative.” Figure
27 shows impact of this variable on attitudes toward abortion.
Figure 27. Crosstab of abany and polviews
Because there are so many categories for political views,
you may have to use the scroll bar at the bottom of the window to move back
and forth across the table. Notice that we’ve scrolled all the way to the
right in Figure 27.
The impact of political views on abortion attitudes is pretty clear. Overall,
liberals support abortion more than do conservatives. The only exception
to the pattern is that people who are “Extremely Liberal” are less supportive
than those who are “Liberal.” This result appears a lot, perhaps because
of the different ways people interpret the two political terms.
Recoding
Variables
It’s often useful to recode variables with many categories,
reducing the number to something more manageable. In the present case, we
might want to combine the categories in polviews to make three: “Liberal,”
“Moderate,” and “Conservative.”
We can combine categories by hand from the kind of table
presented in Figure 27. For example, we can easily calculate that 447 of the
respondents in the table considered themselves liberals (62 + 203 + 182).
Of those, 247 supported a woman’s right to an abortion for any reason (42
+ 110 + 95). Dividing these two numbers tells us that 55 percent of the liberals
supported abortion. A similar calculation tells us that 152 of the 553 conservatives—27
percent—were supportive. The 42-percent support among moderates fits neatly
between the liberals and conservatives.
Combining categories like this makes it easier to use
the variable in further analyses. However, we should have SPSS create a new,
recoded variable so that we don’t have to undertake the job by hand each time.
To do this, we must first return to the Data window. If you’re in the Output
window, you can simply click the Data window icon in the task bar at the
bottom of your screen or select the SPSS Data Editor tag from the Window
menu (see Figure 28 below).
Figure 28. Switching to Data View
Once you’ve returned to the Data window, click the Transform menu
and move your pointer to the Recode option. When you do that, you’ll be presented
with another choice, as Figure 29 shows.
Figure 29. Requesting a Recode
SPSS offers two options for recoding: Either it will
modify the data contained under the existing variable label (Same Variables)
or it will create a new variable for the modified results (Different Variables).
Choose Different Variables, because the first option will destroy the original
data.
Next you’ll see a large dialog box like the one in Figure
30.
Figure 30. The Recode Dialog Box
Initially the right-hand frame will have nothing in it.
To create the situation shown in Figure 31:
1. Select polviews in the variable list and move it to the center
frame by double-clicking it or using the triangular arrow.
2. Type polviewr in the space under Output Variable Name and click Change.
3. Type in a descriptive label to identify what polviewr stands for.
Figure 31. The Completed Recode Dialog Box
To continue the process, click Old and New Values. This
will bring you the dialog box shown in Figure 32.
Figure 32. Specifying How to Recode Categories
To tell SPSS how to create polviewr from polviews, we
identify values of polviews and indicating what values they should get in
polviewr. Let’s start by creating a “Liberals” category that includes everyone
with a “1,” “2,” or “3” on polviews. We’ll give the new category the value
“1.”
In Figure 33, we’ve chosen the “Range” option and indicated
that anyone with a value of “1” through “3” on polviews should be assigned
a “1” on polviewr. Make sure you see where those instructions are entered
in the dialog box.
Figure 33. Creating “Liberals” as a Single Category
When you click the Add button, the transformation instruction
is transferred to the field on the right-hand side of the dialog box, as you
can see in Figure 34.
Figure 34. Renumbering the “Moderate” Category
We’ll use a different option to create a new “Moderate”
category. As you recall, they were scored “4” on polviews. We’ll give them
a “2” in polviewr by entering the old and new values in the Old Value and
New Value fields. When we click Add again, the new instruction is added to
the field. Now take a moment to figure out how you would create a “Conservative”
category, transforming scores of 5, 6, and 7 on polviews into a score of 3
on polviewr. Once you’ve done that, you should have the dialog box shown in
Figure 35. All that remains now is to click Continue, which will return you
to the earlier dialog box, and then click OK.
Figure 35. The Recoding Instructions Completed
Let’s tidy up our new variable. First return to the Data
View window. Next, scroll across the list of variables to the far-right end.
SPSS places each new variable at the end of all the other variables.
Since you just created a new variable called polviewr, SPSS created a new
column located last on your spreadsheet. When you find polviewr, double-click
the variable label at the head of the column. This will open up the Variable
View window and polviewr will be autolatically selected at the bottom of
your variable list. See Figure 36 below.
Figure 36. Finding Polviewr in Variable View
Click on the righ-hand-side of the cell located in the
Decimals column and polviewr row. A little square located in the cell will
appear as shown in Figure 37. You can then reduce to 0 the number of decimals
for each value of polviewr. In other words, you can convert each 1.00
score to simply 1.
Figure 37. Changing Decimals Format for Polviewr in Variable View
Now click on the righ-hand-side button in the Values column
and the polviewr row. A Value Labels dialog box will appear as shown
in Figure 38. Give then names to the new category values:
1. Type “1” in the Value field
2. Type “Liberal” in the Value Label field
3. Click the Add button
Repeat the process to assign “Moderate” to the value
of “2” and “Conservative” to “3,” being sure to click Add each time. When
you’re done, the dialog box should look like Figure 38. Click Continue, then
OK.
Figure 38. Assigning Value Labels to polviewr
Click then on the righ-hand-side button in the Missing
column and the polviewr row. The dialog box shown in Figure 39.
Type 9 in the first space available under Discrete Missing Values. You
have just indicated to SPSS that any value 9 in the data set for polviewr
should be considered “missing answer” and removed from statistical computation.
Figure 39. Assigning Missing Values to polviewr
Finally you can change the measurement type for polviewr
as shown in Figure 40. I selected Ordinal for polviewr since the values can
be ranked from low to high level of conservatism.
Figure 40. Assigning Measurement Type to polviewr
Now when you select Analyze/Descriptive Statistics/Frequencies and scroll
through the list of variables, you’ll find a new entry in the list: polviewr.
Choose it to see the frequency distribution generated by our new categories
(see Figure 41).
Figure 41. Frequency Distribution for polviewr
Since we have gone to all this trouble to make our analysis
simpler, let’s see if it worked. Let’s use polviewr to reexamine the relationship
between political orientations and attitudes toward abortion. Use Analyze/Descriptive
Statistics /Crosstabs to create a table with abany and polviewr. Figure
41 illustrates what you should get.
Figure 42. Crosstab of abany and polviewr **
Notice how much easier it is to read this table, compared
with the one presented in Figure 27. We see that 55 percent of the liberals,
42 percent of the moderates and 28 percent of the conservatives support a
woman’s right to an abortion for any reason. (It’s good to round off the decimal
points in percentages like these, since they’re based on samples, which only
provide estimates of populations in the first place.)
Multivariate
Tables
Bivariate tables are typically only the beginning of
quantitative data analysis. For example, you might want to see if the observed
relationship between politics and abortion holds equally for men and women.
SPSS makes it a simple matter to satisfy your curiosity uabout such matters.
Return to Analyze/Descriptive Statistics /Crosstabs and
specify a third variable as shown in Figure 43.
Figure 43. Trivariate Table Request
Notice that we’ve simply transferred the third variable,
sex, into the bottom field in the dialog box. Press OK to see the result,
illustrated in Figure 44.
Figure 44. Table of abany by polviewr by sex
In a sense, this new table splits the one shown in Figure
42 into two parts. The top half shows the relationship between polviewr and
abany for men, the bottom half shows the same relationship for women. We can
see immediately that the original relationship is replicated for each of
the gender categories.
In the far-right column, the summary statistics show
you the relationship between gender and support for abortion. Overall (i.e.,
forgetting about political orientations), an equal 41 percent for men and
women support a woman’s right to an abortion for any reason—an interesting
similarity. (Notice that I've rounded off the figures, 40.5% and 40.8%, presented
in the table.) It seems that there is no sex effect on abany.
The sex of the respondent does not matter for this GSS question.
Comparing men and women in the other columns of the table
tells us that sex has little impact no matter what a person’s political orientation
is. Women are more supportive among liberals, men are slightly more supportive
among moderates and among conservatives. None of the differences are very
large, however.
SPSS allows you to go beyond trivariate tables, though
they grow increasingly difficult to read and analyze. To experiment with this
possibility, click the Next button near the bottom of the Crosstabs dialog
box to add new Layers of variables to the table.
Tests of Statistical Significance
In the previous example, I casually remarked that the
percentage differences were not very large. This was a subjective assessment
of the substantive significance of the differences.
As you know, tests of statistical significance can determine
the likelihood that relationships observed in a sample are merely an artifact
of sampling error rather than a reflection of a real difference in the population
from which the sample was drawn. Let’s take a look at how SPSS offers us the
use of those tests.
Return to the Crosstabs dialog box, via Analyze/Descriptive
Statistics. At the bottom of the box, click a button marked Statistics. Figure
45 illustrates the results.
Figure 45. Choice of Statistics in a Crosstabs Dialog Box **
As you can see, SPSS offers several summary statistics,
three of which you’ll recognize from this textbook: chi square, lambda, and
gamma. Recall that chi square is appropriate to nominal variables such as
abany and sex, so let’s use them to see how we can use SPSS to work with chi
square.
Click the Chi-square box. Then click Continue and enter
abany and sex in the appropriate places in the Crosstabs dialog box. In addition
to the regular percentage table, SPSS now provides an additional table, shown
in Figure 46.
Figure 46. Chi Square for abany and sex
If you’ve had a statistics course, you’ll recognize many
of the tests presented in this table. For our purposes, let’s focus on the
first row of the results, the “Pearson Chi-Square.” The third column tells
us the probability that sampling error alone could have generated a relationship
as strong as the one we’ve observed, if men and women in the whole population
were exactly the same in their attitudes toward abortion. Specifically, it
tells us that the probability is .972, 97 chances in 100. This probability
level is extremely high. Thus, the chi square test confirms what we
had concluded subjectively from our crosstabulations: Men and women do not
differ at all in their support for a woman’s right to an abortion.
The relationship between abany and polviewr was much stronger. Let’s see
how chi square evaluates that relationship. Repeat the above procedure,
changing sex to polviewr. Notice that you don’t have to select Chi-square
again, any more than you have to select Columns. SPSS maintains those specifications
until you shut down the program. When you start it up again, you’ll have
to specify such preferences again. And, of course, you can turn them off
any time you no longer desire them.
Figure 47. Chi square for abany and polviewr
See Figure 47 above for the chi square evaluation of
abany and polviewr. Notice that the significance in this case is calculated
at “.000”. SPSS only presents the first three decimal points in this calculation.
Hence, the likelihood of the observed relationship being simply a product
of sampling error isn’t exactly zero—it could happen‚ but the chances are
not very high that it did. Specifically, the probability is less than .001,
or less than one chance in a thousand, which is a commonly used standard for
statistical significance. Thus, we conclude that the relationship we’ve observed
in this carefully selected sample very likely represents something that exists
in the larger population.
Correlation
and Regression
Thus far we’ve been examining nominal and ordinal data,
which constitute the bulk of social science data. SPSS can also help you work
with interval and ratio data.
For example, you may have heard that highly educated people tend to have
fewer children than do those with less education. Let’s use SPSS to see
if it’s so. In the GSS, these variables are educ and childs. Under Analyze,
select Correlate and, when asked, Bivariate. (SPSS can undertake more complex
correlational analyses, but we’ll keep it simple for this introduction.)
In the Correlations dialog box, select educ and childs and click OK. That
will produce the result shown in Figure 48.
Figure 48. Pearson’s Product Moment Correlation
As you can see, there is a correlation of –.210 (or a
“negative correlation of .210”) between the two variables. The negative correlation
means that as the years of education increase, the number of children decreases.
Of course, this analysis cannot determine the causal
direction, so we could also say that as the number of children increases,
the amount of education completed decreases. Both interpretations make sense
and probably apply in some cases. Some young people have to cut their education
short to accommodate the demands of parenthood, and those who keep going
to school may have to delay parenthood and have fewer children once they
get started.
Whatever the explanation for the relationship, SPSS informs
us that the correlation is significant at the 0.01 level. In other words,
sampling error could account for a correlation like this one less than once
in a hundred times.
Although we entered only two variables in this analysis, SPSS will accept
as many at a time as you want and will create a correlation matrix in which
every variable is correlated with every other variable. Experiment with
this possibility.
Regression analysis builds on the logic of correlation and creates equations
that predict values of one variable based on values of others. Here’s how
we could represent the relationship between childs and educ as a regression
equation. Under Analyze, select Regression, choosing Linear from among the
alternatives offered. This will present you with the dialog box presented
in Figure 49.
Figure 49. Regression Dialog Box
Let’s use the logic of accounting for the number of children
people have; thus, childs is our dependent variable, and educ is the independent
variable we’ll use to account for differences in numbers of children. Enter
the two variables into the dialog box as show in Figure 49 above. Click OK,
to get the result shown in Figure 50.
Figure 50. Linear Regression Predicting childs with educ
SPSS will present you with three tables of calculations,
but we are only interested here in the third one, Coefficients. In fact, we’re
only interested in the first column of this table, Unstandardized Coefficients.
The first of these, the Constant, represents the value of the dependent variable
(number of children) when the value of the independent variable (years of
education) is zero. Statisticians sometimes refer to this as the y-intercept
or the point where the line crosses the y-axis, when the regression line is
plotted on a graph.
The B value associated with the independent variable
(–.121) indicates how much the dependent variable changes with each added
unit of the independent variable In our example, this means what change in
the number of children we should expect for each added year of education.
Stated as an equation, the regression looks like this:
childs = 3.407 – (.121 x educ)
Suppose a person has 10 years of education. We would
predict she or he has
3.407 – 1.21 = 2.197 children
For college graduates with 16 years of education we’d
predict they have
3.407 – 1.936 = 1.471 children
Clearly, these estimates represent statistical averages,
because no one can have 2.197 or 1.471 children. Still if you were to bet
on the number of children people had and knew only their education, this equation
would be your best guide for betting. If you could make a lot of bets on
this basis, you’d be a winner overall.
To explore regression further, try adding another independent
variable. SPSS will provide you with a new y-intercept and coefficients for
each of the independent variables. Be sure to interpret positive and negative
signs correctly.
Creating Indexes
The textbook discussed the creation of composite measures,
such as indexes and scales. This section looks briefly at how to use SPSS
to create a simple index.
Without reviewing the logic of index construction, let’s
create an index of sexual permissiveness including the following GSS variables:
premarsx: sex before marriage
xmarsex: sex with person other than spouse
homosex: homosexual relations
In each of these items, respondents were asked whether
the action was
1. Always wrong
2. Almost always wrong
3. Sometimes wrong
4. Not wrong at all
Given the format of these three items, we can create
a composite index quite simply. Although the values 1–4 used to represent
the answers to these questions are merely labels-just as we used “1” for
male and “2” for female—we can in this case take advantage of their numerical
quality. In each of these items, the higher the numerical code, the higher
the level of sexual permissiveness. If we add the values respondents received
on the three items, the possible totals range from 3 to 12, with 12 representing
the highest and 3 the lowest degree of sexual permissiveness.
We can now generate the index by using the Transform/Compute
menu option, as illustrated in Figure 51. Enter the information by typing
or by selecting the variable names from the list and clicking the plus sign
in the keypad provided in the dialog box (see Figure 51).
Figure 51. Adding the values of premarsx, xmarsex, and homosex
When you’re through, click OK in the dialog box. SPSS
will create a new variable, sexperm, in your data set and will assign the
appropriate values to each of the respondents. In the Data window, scroll
to the far right and find the new variable in the last column. Scroll up and
down to see the values assigned to respondents. Those with no values in the
new column were missing data on the three items used to construct it.
For a more comprehensive view of the new variable, run the frequency distribution
for sexperm (Figure 52).
Figure 52. The Frequency Distribution for sexperm
Having created a composite measure such as this one,
it’s always good to validate the scores if possible. That is, if the index
scores truly distinguish levels of sexual permissiveness, then those scores
should predict the answers people gave to other questions. For example, we
might wonder if attitudes toward abortion are related to sexual permissiveness.
We can find out by running a cross-tabulation of the index and, say, abany.
The result of this validation effort is presented in
Figure 53. Notice that this table uses a somewhat different format than those
we’ve created earlier. Given the large number of categories comprising sexperm,
it’s difficult to fit the table on the computer screen (and in this book).
Thus, I have made sexperm the row variable, made abany the column variable,
and requested that the table be percentaged by row rather than column. Thus
we read this table “down,” whereas we’ve been reading earlier ones “across.”
Figure 53. Validating the sexperm Index
The relationship between these two variables is extremely
strong and consistent. Of those with a score of 3 on the index (representing
the lowest level of sexual permissiveness), only 19 percent support a woman’s
right to an abortion for any reason. This percentage increases steadily as
index scores increase, reaching 67 percent among those with a score of 12
on the index.
Creating an index from variables that do not permit such
a simple addition of code values is a little more involved. To illustrate,
let’s create an index of where respondents stand on the issue of guns. Two
items in the GSS are relevant:
gunlaw: favor or oppose gun permits
1. Favor
2. Oppose
owngun: have gun in home
1. Yes
2. No
It makes sense that those who have a gun and oppose requiring
permits for owning guns are the most pro-gun, while those without a gun of
their own and who favor gun permits would be the most anti-gun. Notice, however,
that the pro-gun position is represented by a “2” on gunlaw and a “1” on owngun.
Thus, we can’t simply add the values Here’s how to generate a simple index
from these two items.
Let’s create a new variable, progun, for which higher
scores indicate greater support for guns. To start this process, return to
Transform/Compute. Type in the Target Variable and give everyone a starting
score of “0,” as in Figure 54. Click OK to create the variable. Then return
to Transform/Compute and change the “0” to “progun + 1” as indicated in Figure
55.
Figure 54. Initializing progun
Figure 55. Adding a Point to progun
We’re not going to add a point to everyone’s index score,
however. Click the If button near the bottom of the dialog box, so we can
specify the conditions under which we want to add a point. Next, click the
button beside the phrase “include if case satisfies condition.” Then create
the condition shown in Figure 56. By doing this, we’re telling SPSS to give
an additional point to anyone who said they oppose gun permits (i.e., a “2”
on gunlaw).
Figure 56. Adding a Point for Opposing Gun Permits
Click Continue to return to the earlier dialog box. Then
click OK to instruct SPSS to take the action specified. When SPSS tells you
that you’re about to change an existing variable, say “yes.”
Select Transform/Compute again. Notice that the earlier
instruction to add a point is still there. Leave it, but click If in order
to modify the condition. Change it to specify those who said they owned a
gun (“1” on owngun) by indicating “owngun = 1” as the condition. Click Continue,
then OK, then “yes,” as before. Now those who had a score of 0 for favoring
gun permits will now get 1 point (for a total of “1”) if they own a gun; they
still have 0 points if they don’t have a gun. Those who scored a point for
opposing gun permits will get another point if they own a gun (a total of
“2”) but will stay at 1 point if they don’t have a gun. The resulting index
is made up of the scores “0,” “1,” and “2.”
There’s only one problem with the index as it stands.
Since everyone started with 0 points, those who didn’t answer one or both
of these questions will end up with a score of zero, thus seeming to oppose
guns. The final step in creating this index involves culling out those with
missing data.
First, let’s create a “missing data” code. We’ll use
“99.” Return to Transform/Compute. In the first dialog box, type “progun
= 99.” Click If to specify the condition: “MISSING(gunlaw)” as shown in Figure
57. You need to first select the “MISSING(variable)” function and select
gunlaw and click on the arrow. Click Continue and OK, then repeat the
procedure for “MISSING(owngun).”
Figure 57. Missing Data as a Condition
If you examine the response possibilities for owngun,
however, you’ll find that 23 people refused to answer and were coded “3.”
Return to Transform/Compute and assign an index value of 99 for anyone with
“owngun = 3.”
As a final step, we’re going to recode the 99. Select Transform/Recode,
but this time choose Same Variables. Once you reach the dialog box, convert
the 99 to a SYSMIS as illustrated in Figure 58. Enter 99 as the old value,
click ‘system-missing’ then click the Add button.
To complete the action, click Continue and OK. The index
is now complete. You can check it out by running Analyze/Descriptive Statistics/Frequencies.
To reassure yourself further, run a cross-tabulation between the two items
to verify that the correct number of people received each of the scores.
Figure 58. Converting 99 to SYSMIS
Graphics
With the improvement of computer graphics, SPSS now offers
many options for presenting data in nontabular formats. Let’s explore a few
of these, beginning with simple frequency distributions.
Figure 59 presents the distribution of GSS data on religious
affiliation (relig) as a pie chart. You can create this by (1) selecting Pie
under Graphs, (2) choosing Summaries for Groups of Cases, then (3) specifying
relig as the variable to portray. Select “% of cases.” Before
clicking OK, click on the Titles button and type the title of your graph in
the dialog box. Here I typed “Pie Chart of Religious Affiliation.”
Figure 59. Pie Chart of Religious Affiliation
Figure 60 shows the results of this operation.
As you can see, the pie chart is small and refers to all religious categories.
This pie chart is not very useful and requires some simple formatting.
We need to collapse into one larger label all these slices too small to make
sense of the graph. Double click on the graph in your output window
and a graph dialog box will appear. Select Option from the Chart menu.
A Pie Options dialog box appears shown in Figure 61. Select “Collapse
(sum) slices less than 5%” (note that you can change this percentge if you
want to include more categories under this new collapsed one). Under
this same dialogue box select Percents so that the percentage of each slice
is indicated on the graph. Click Ok and close the SPSS Chart Editor
window.
Figure 60. Pie Chart of Religious Affiliation in Output Window
Figure 61. Formatting Pie Chart of Religious Affiliation
Figure 62 shows the result of this formatting procedure.
Only the “Catholic,” “None” and “Protestant” slices remain unchanged.
All other categories are collapsed under the “Other” pie slice. There are
many options to explore and I invite you to experiment on your own in order
to polish statistical visual representations in your papers. When you
are ready to import a SPSS graph to your paper, simply select this graph from
the outline menu on the left side of the output window. A frame around
the graph will indicate that you have selected it. Then choose “Copy
Objects” from the Edit menu. Open your Word document and paste the
graph wherever you want.
Figure 62. Copying Pie Chart of Religious Affiliation
If you’re on a diet that rules out pies, see Figure 63
for a bar graph of relig.
Figure 63. Bar Graph of relig
Ratio variables, such as the number of years of education,
might be presented as line graphs. See Figure 64.
Figure 64. Line Graph of educ
These are just a few of the graphic options available
to you in SPSS. Experiment with them to find the form of presentation most
appropriate to your purposes.
Making Copies of Results
for a Paper
Often, you will use SPSS to undertake quantitative analyses
for a term paper, thesis, or other project. Although you can retype the results
of SPSS into your paper, you can also take advantage of some energy-saving
options. Depending on your word-processing system, you may have to experiment
a bit.
As shown earlier, it is very easy o copy and paste from
SPSS outputs to Word documents. Simply make sure you have selected
the objects you want to copy (which should be then framed) and use the Edit/Copy
objects command from SPSS and then Paste in the Word document of your choice.
Though the easiest strategy is to copy and paste from
SPSS to Word, you can also export your statistical results. To try making
a hard copy of a graph, create the pie chart for polviewr. Click the resulting
graph. As explained above, you’ll see a box appear around it, indicating that
it has been selected by the computer. Then in the File menu select Export.
Figure 65 illustrates the resulting dialog box.
Figure 65. Export Dialog Box
You have several options here. You can export your output
document with charts, without charts, or exclusively the charts of your output
window. For our purposes, export the Chart Only. You can then either
change the name of the export file you are going to create or accept (and
remember) the name and location SPSS has proposed. Again, for present purposes,
choose to export the Selected Objects and choose JPEG File (*.JPG) as the
export format. Once you’ve done all this, click OK.
Run your word-processing system and open the document
that desperately needs this table. Click where you want the graph and select
Picture and then From File from the Insert menu (this procedure may be different
if you are using another word processor than MS Word). Browse your computer
until you find your JPEG file. Remember to change the Files of Type
option to All Files to view all documents and not exclusively Word documents.
To try making a hard copy of a table, create a frequency
distribution for gunlaw. The same procedure you used to export graphs is possible
if you choose to export tables. However, you will lose the formatting
of the tables you export. I suggest that you choose the Export Format
HTML file (*htm) option. This format preserves the best the layout
of your tables with SPSS. Open then your Word document and select File
from the Insert menu and browse until you find your output file. You
should be rewarded with something like the table I’ve put in Figure 66.
Figure 66. Text Version of gunlaw Frequency Distribution
In all this, you may also want to take advantage of SPSS’s
multitude of table formats. To explore these, choose Edit/Options in SPSS
and click on the Pivot Tables tab. Once there, click the various options under
TableLook and SPSS will give you a sample layout in the field to the right
of the list. If you find a format that interests you, leave it highlighted
when you close the dialog box and then create a new table. It will be done
in the format you’ve specified.
Shutting Down
As much as you may come to love SPSS, you’ll have to
quit the program eventually. Go to the File menu and select Exit. SPSS will
respond with a question that asks whether you want to save the Output file
you’ve created. If you give it a name and a disk location for saving it,
you’ll be able to open it later on and retrieve any data created in your
analyses. If you’ve just been practicing SPSS, you’ll probably want to say
“No.”
If you’ve changed the data set, by creating a recoded
variable, for example, SPSS will ask if you want to save the changes. Unless
you want to get rid of the changes, say “Yes.” However, you should only alter
the data file if you have permission to do that. If you’re sharing a file
with others in your class, for example, it may not be appropriate for you
to save your changes. Discuss this with your instructor if in doubt.
SPSS will now close. And so will this guide. Have fun.