Advanced SPSS
Categorical Regression
Logistic Regression
Data Analysis
Statistics Tutorial
Categorical Regression Analysis in SPSS: A Complete Guide to Binary & Multinomial Logistic Regression
Module 3, Section 3.1 | Advanced SPSS Tutorial Series | Reading time: ~12 minutes
What you’ll learn: This guide walks you through Categorical Regression Analysis (CATREG) in SPSS — from understanding the theory to running Binary Logistic Regression and Multinomial Logistic Regression step by step, with full interpretation of SPSS output tables.
1. What Is Categorical Regression Analysis?
Categorical regression — also known by its SPSS acronym CATREG — is a technique that quantifies categorical data by assigning numerical values to categories, producing an optimal linear regression equation for the transformed variables.
In standard linear regression, you minimize the sum of squared differences between a response (dependent) variable and a weighted combination of predictors. This works cleanly for continuous, quantitative data. But when your dependent variable is categorical — for example, a binary outcome like “passed/failed” or a nominal outcome with multiple categories — you need a different approach.
Categorical regression assigns optimal numerical values to the categories themselves, preserving the relationship structure while enabling a regression framework. This is powerful because:
- Categorical variables can define groups of cases
- The model estimates separate parameter sets for each group
- Prediction of the response is possible for any combination of predictor values
Within SPSS, categorical regression analysis splits into two major techniques depending on the nature of your dependent variable:
| Technique |
When to Use |
Dependent Variable Type |
| Binary Logistic Regression |
Outcome has exactly 2 categories |
Dichotomous (e.g., yes/no, pass/fail) |
| Multinomial Logistic Regression |
Outcome has 3 or more categories |
Nominal (e.g., coffee/tea/water) |
2. Binary Logistic Regression in SPSS
A binomial logistic regression (commonly called simply “logistic regression”) predicts the probability that an observation falls into one of two categories of a dichotomous dependent variable, based on one or more independent variables that can be continuous or categorical.
Common real-world examples include:
- Predicting exam performance (pass/fail) based on revision time, test anxiety, and lecture attendance
- Predicting drug use (yes/no) based on prior criminal convictions, income, age, and gender
- Predicting presence of heart disease (yes/no) based on age, gender, weight, and VO₂max
If your dependent variable is a count, use Poisson regression instead. If it has more than two categories, use multinomial logistic regression (covered in Section 3 below).
2.1 Key Assumptions for Binary Logistic Regression
Before running binary logistic regression in SPSS, your data must satisfy four critical assumptions:
Assumption 1
Dichotomous dependent variable. Your outcome variable must have exactly two categories — for example, gender (male/female), heart disease (yes/no), or personality type (introvert/extrovert). If your dependent variable is continuous, use multiple regression; if ordinal, use ordinal regression.
Assumption 2
One or more independent variables. These can be
continuous (e.g., revision time in hours, IQ score, weight in kg) or
categorical — either ordinal (e.g., Likert scale items) or nominal (e.g., gender, ethnicity, profession).
Assumption 3
Independence of observations. Each observation must be independent, and the categories of the dependent variable must be mutually exclusive and exhaustive — every case belongs to exactly one category.
Assumption 4
Linear relationship with logit. There must be a linear relationship between any continuous independent variables and the
logit transformation of the dependent variable. This is tested within SPSS during the analysis.
2.2 Step-by-Step Procedure in SPSS
Figure 46: The Binary Logistic Regression dialogue box in SPSS. Transfer your dependent variable to the Dependent box and independent variables to the Covariates box.
- Go to Analyze → Regression → Binary Logistic… from the top menu.
- In the Logistic Regression dialogue box, transfer your dependent variable into the Dependent: box and your independent variables into the Covariates: box using the arrow buttons.
- Keep the Method: option at its default value of Enter. This is the standard regression method in SPSS. (The “Previous” and “Next” buttons are used for sequential/hierarchical logistic regression only.)
- Click the Categorical button. The Logistic Regression: Define Categorical Variables dialogue opens. Transfer any categorical independent variables from the Covariates: box into the Categorical Covariates: box. SPSS does not do this automatically — you must define them manually.
- In the Change Contrast area, change the Reference Category from Last to First (or whichever group you want as your reference), then click Change. Click Continue to return to the main dialogue.
- Click the Options button to open the Logistic Regression: Options dialogue.
Figure 47: The Logistic Regression Options dialogue box. Select Classification plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals, and CI for exp(B). Set Display to “At last step.”
- In the Statistics and Plots area, check: Classification plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals, and CI for exp(B).
- In the Display area, select At last step.
- Click Continue to return to the Logistic Regression dialogue, then click OK to run the analysis and generate output.
2.3 Interpreting Binary Logistic Regression Output
SPSS generates several output tables. The two most important for understanding your results are:
Model Summary — Variance Explained
Table 21: Model Summary. The Cox & Snell R² and Nagelkerke R² values show how much variation in the dependent variable is explained by the model.
The Model Summary table contains the Cox & Snell R² and Nagelkerke R² values — both measure explained variation (similar to R² in linear regression), though they are lower and must be interpreted with caution. These are called pseudo R² values.
Example result: The explained variation in the dependent variable ranges from 24.0% (Cox & Snell R²) to 33.0% (Nagelkerke R²). Always report the Nagelkerke R² — it is the preferred measure because, unlike Cox & Snell R², it can reach a maximum value of 1.
Variables in the Equation — Statistical Significance
Table 22: Variables in the Equation. The Wald statistic determines whether each predictor contributes significantly to the model. Exp(B) gives the odds ratio.
The Variables in the Equation table shows each independent variable’s contribution to the model and its statistical significance:
- The Wald column contains the Wald chi-square statistic for each predictor
- The Sig. column shows the p-value — values below .05 indicate a statistically significant contribution
- The Exp(B) column gives the odds ratio for each predictor
Example result: Age (p = .003), gender (p = .021), and VO₂max (p = .039) all contributed significantly to the model. Weight (p = .799) did not. The odds of having heart disease are 7.026 times greater for males compared to females, as shown by the Exp(B) value for gender.
3. Multinomial Logistic Regression in SPSS
Multinomial logistic regression is used to predict a nominal dependent variable given one or more independent variables. It is an extension of binary logistic regression for situations where your outcome has three or more unordered categories.
Real-world examples:
- Predicting preferred drink type (coffee/soft drink/tea/water) based on UK location and age
- Predicting job position (junior/middle/senior management) based on employment duration, qualifications, and gender
Multinomial logistic regression works by comparing each category of the dependent variable to a reference category, producing multiple sets of logistic regression coefficients — one for each non-reference category.
3.1 Key Assumptions for Multinomial Logistic Regression
Assumption 1
Nominal dependent variable. Examples: ethnicity (Caucasian/African American/Hispanic), transport type (bus/car/tram/train). Ordinal dependent variables can technically be used, but ordinal regression is more appropriate.
Assumption 2
One or more independent variables that can be continuous, ordinal, or nominal (including dichotomous). Important: ordinal independent variables must be treated as either continuous or categorical in SPSS — they cannot be entered as ordinal.
Assumption 3
Independence of observations and mutually exclusive, exhaustive categories in the dependent variable.
Assumption 4
No multicollinearity. Multicollinearity — when two or more independent variables are highly correlated — causes problems in identifying which variable explains the outcome. This requires careful checking in SPSS, typically by running multiple linear regression procedures on any dummy variables.
Assumption 5
Linear relationship with logit. Continuous independent variables must have a linear relationship with the logit transformation of the dependent variable.
Assumption 6
No significant outliers, high leverage values, or highly influential points in the data.
3.2 Step-by-Step Procedure in SPSS
When setting up multinomial logistic regression in SPSS, you need to classify variables correctly. SPSS distinguishes between:
- Covariates — continuous independent variables
- Factors — nominal independent variables
- Go to Analyze → Regression → Multinomial Logistic…
- In the dialogue box, transfer: your dependent variable into the Dependent: box, nominal independent variables into the Factor(s): box, and continuous independent variables into the Covariate(s): box.
- Click the Statistics button to open the Multinomial Logistic Regression: Statistics dialogue.
Figures 48 & 49: Select Cell probabilities, Classification table, and Goodness-of-fit in the Statistics dialogue. These options generate the key output tables you need for interpretation.
- Check the Cell probabilities, Classification table, and Goodness-of-fit checkboxes.
- Click Continue, then click OK to generate the results.
3.3 Interpreting Multinomial Logistic Regression Output
Goodness-of-Fit Table
Table 23: Goodness-of-Fit table. A non-significant Pearson result (p > .05) indicates the model fits the data well.
The Goodness-of-Fit table provides two measures to assess how well the model fits the data:
- Pearson chi-square: A statistically significant result (p < .05) would indicate a poor model fit. In this example, p = .341, which is not significant — the model fits the data well.
- Deviance chi-square: An alternative measure that may not always agree with the Pearson statistic.
If both statistics disagree, report both and note the discrepancy. Neither measure alone is definitive — always consult additional tables such as the Model Fitting Information and Likelihood Ratio Tests.
Model Fitting Information
The Model Fitting Information table tests whether all model coefficients are zero — in other words, whether your independent variables as a group significantly improve prediction over the intercept-only baseline model.
Example result: p = .027 — the full model statistically significantly predicts the dependent variable better than the intercept-alone model.
Pseudo R-Square
SPSS reports three pseudo R² measures — Cox & Snell, Nagelkerke, and McFadden. Unlike R² in ordinary least-squares regression, none of these are easily interpretable in isolation, but they give a general sense of variance explained. The Nagelkerke R² is typically preferred for reporting.
Likelihood Ratio Tests
The Likelihood Ratio Tests table is critical for multinomial regression — it is the only table that shows the overall statistical significance of each independent variable (especially important for nominal predictors, where the Parameter Estimates table only shows individual dummy variable coefficients).
Example result: income was not statistically significant (p = .754), but tax_too_high was significant (p = .014). The model produces two sets of logistic coefficients (logits) — one for each non-reference category of the dependent variable.
4. Binary vs. Multinomial Logistic Regression: Quick Comparison
| Feature |
Binary Logistic Regression |
Multinomial Logistic Regression |
| Dependent variable |
Dichotomous (2 categories) |
Nominal (3+ categories) |
| SPSS menu path |
Analyze → Regression → Binary Logistic |
Analyze → Regression → Multinomial Logistic |
| Key output tables |
Model Summary, Variables in the Equation |
Goodness-of-Fit, Model Fitting Information, Likelihood Ratio Tests, Parameter Estimates |
| Variance explained |
Nagelkerke R² (preferred) |
Cox & Snell, Nagelkerke, McFadden pseudo R² |
| Significance of predictors |
Wald statistic (Sig. column) |
Likelihood Ratio Tests table (overall); Parameter Estimates (individual coefficients) |
| Number of logistic coefficients |
One set |
One set per non-reference category (k−1 logits) |
| Multicollinearity check |
Recommended |
Required (Assumption 4) |
5. Conclusion & Next Steps
Categorical regression analysis in SPSS is a powerful toolkit for working with real-world data where outcomes are categorical rather than continuous. Whether you’re predicting a binary event like disease presence or a multi-class outcome like political affiliation, SPSS provides structured procedures that guide you from data setup through to interpretable results.
Key takeaways from this module:
- Choose your method based on your dependent variable: two categories → Binary Logistic Regression; three or more categories → Multinomial Logistic Regression.
- Always verify your assumptions before running the analysis — skipping this step can produce invalid results.
- For binary regression, report the Nagelkerke R² for variance explained and use Exp(B) to interpret the odds ratios.
- For multinomial regression, the Likelihood Ratio Tests table is your primary tool for assessing individual predictor significance.
- Categorical independent variables must always be explicitly defined in SPSS — the software does not detect them automatically.
Coming up in Module 3.2: Pilot Testing & Reliability Testing — including Cronbach’s Alpha and when to use it for assessing internal consistency in survey instruments.
This tutorial is part of the Advanced SPSS Tutorial Series (Module 3). Figures and tables are reproduced from the SPSS Tutorial Guide (Tobit Research Consulting Ltd, March 2022). Always ensure your data meets the stated assumptions before applying these techniques in practice.