Advanced SPSS Categorical Regression Logistic Regression Data Analysis Statistics Tutorial

Categorical Regression Analysis in SPSS: A Complete Guide to Binary & Multinomial Logistic Regression

Module 3, Section 3.1 | Advanced SPSS Tutorial Series | Reading time: ~12 minutes

What you’ll learn: This guide walks you through Categorical Regression Analysis (CATREG) in SPSS — from understanding the theory to running Binary Logistic Regression and Multinomial Logistic Regression step by step, with full interpretation of SPSS output tables.

What Is Categorical Regression Analysis?
Binary Logistic Regression in SPSS
Multinomial Logistic Regression in SPSS
Binary vs. Multinomial: Quick Comparison
Conclusion & Next Steps

1. What Is Categorical Regression Analysis?

Categorical regression — also known by its SPSS acronym CATREG — is a technique that quantifies categorical data by assigning numerical values to categories, producing an optimal linear regression equation for the transformed variables.

In standard linear regression, you minimize the sum of squared differences between a response (dependent) variable and a weighted combination of predictors. This works cleanly for continuous, quantitative data. But when your dependent variable is categorical — for example, a binary outcome like “passed/failed” or a nominal outcome with multiple categories — you need a different approach.

Categorical regression assigns optimal numerical values to the categories themselves, preserving the relationship structure while enabling a regression framework. This is powerful because:

Categorical variables can define groups of cases
The model estimates separate parameter sets for each group
Prediction of the response is possible for any combination of predictor values

Within SPSS, categorical regression analysis splits into two major techniques depending on the nature of your dependent variable:

Technique	When to Use	Dependent Variable Type
Binary Logistic Regression	Outcome has exactly 2 categories	Dichotomous (e.g., yes/no, pass/fail)
Multinomial Logistic Regression	Outcome has 3 or more categories	Nominal (e.g., coffee/tea/water)

2. Binary Logistic Regression in SPSS

A binomial logistic regression (commonly called simply “logistic regression”) predicts the probability that an observation falls into one of two categories of a dichotomous dependent variable, based on one or more independent variables that can be continuous or categorical.

Common real-world examples include:

Predicting exam performance (pass/fail) based on revision time, test anxiety, and lecture attendance
Predicting drug use (yes/no) based on prior criminal convictions, income, age, and gender
Predicting presence of heart disease (yes/no) based on age, gender, weight, and VO₂max

If your dependent variable is a count, use Poisson regression instead. If it has more than two categories, use multinomial logistic regression (covered in Section 3 below).

2.1 Key Assumptions for Binary Logistic Regression

Before running binary logistic regression in SPSS, your data must satisfy four critical assumptions:

Assumption 1

Dichotomous dependent variable. Your outcome variable must have exactly two categories — for example, gender (male/female), heart disease (yes/no), or personality type (introvert/extrovert). If your dependent variable is continuous, use multiple regression; if ordinal, use ordinal regression.

Assumption 2

One or more independent variables. These can be continuous (e.g., revision time in hours, IQ score, weight in kg) or categorical — either ordinal (e.g., Likert scale items) or nominal (e.g., gender, ethnicity, profession).

Assumption 3

Independence of observations. Each observation must be independent, and the categories of the dependent variable must be mutually exclusive and exhaustive — every case belongs to exactly one category.

Assumption 4

Linear relationship with logit. There must be a linear relationship between any continuous independent variables and the logit transformation of the dependent variable. This is tested within SPSS during the analysis.

2.2 Step-by-Step Procedure in SPSS

SPSS Binary Logistic Regression dialogue box showing variables including caseno, age, weight, gender, VO2max, and heart_disease, with Dependent, Covariates, and Method fields

Figure 46: The Binary Logistic Regression dialogue box in SPSS. Transfer your dependent variable to the Dependent box and independent variables to the Covariates box.

Go to Analyze → Regression → Binary Logistic… from the top menu.
In the Logistic Regression dialogue box, transfer your dependent variable into the Dependent: box and your independent variables into the Covariates: box using the arrow buttons.
Keep the Method: option at its default value of Enter. This is the standard regression method in SPSS. (The “Previous” and “Next” buttons are used for sequential/hierarchical logistic regression only.)
Click the Categorical button. The Logistic Regression: Define Categorical Variables dialogue opens. Transfer any categorical independent variables from the Covariates: box into the Categorical Covariates: box. SPSS does not do this automatically — you must define them manually.
In the Change Contrast area, change the Reference Category from Last to First (or whichever group you want as your reference), then click Change. Click Continue to return to the main dialogue.
Click the Options button to open the Logistic Regression: Options dialogue.

SPSS Logistic Regression Options dialogue box with Statistics and Plots checkboxes including Classification plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals, and CI for exp(B)

Figure 47: The Logistic Regression Options dialogue box. Select Classification plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals, and CI for exp(B). Set Display to “At last step.”

In the Statistics and Plots area, check: Classification plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals, and CI for exp(B).
In the Display area, select At last step.
Click Continue to return to the Logistic Regression dialogue, then click OK to run the analysis and generate output.

2.3 Interpreting Binary Logistic Regression Output

SPSS generates several output tables. The two most important for understanding your results are:

Model Summary — Variance Explained

Model Summary table showing -2 Log likelihood of 102.088, Cox and Snell R Square of .240, and Nagelkerke R Square of .330

Table 21: Model Summary. The Cox & Snell R² and Nagelkerke R² values show how much variation in the dependent variable is explained by the model.

The Model Summary table contains the Cox & Snell R² and Nagelkerke R² values — both measure explained variation (similar to R² in linear regression), though they are lower and must be interpreted with caution. These are called pseudo R² values.

Example result: The explained variation in the dependent variable ranges from 24.0% (Cox & Snell R²) to 33.0% (Nagelkerke R²). Always report the Nagelkerke R² — it is the preferred measure because, unlike Cox & Snell R², it can reach a maximum value of 1.

Variables in the Equation — Statistical Significance

Variables in the Equation table showing B coefficients, Wald statistics, significance values, and Exp(B) for age, weight, gender, VO2max and Constant

Table 22: Variables in the Equation. The Wald statistic determines whether each predictor contributes significantly to the model. Exp(B) gives the odds ratio.

The Variables in the Equation table shows each independent variable’s contribution to the model and its statistical significance:

The Wald column contains the Wald chi-square statistic for each predictor
The Sig. column shows the p-value — values below .05 indicate a statistically significant contribution
The Exp(B) column gives the odds ratio for each predictor

Example result: Age (p = .003), gender (p = .021), and VO₂max (p = .039) all contributed significantly to the model. Weight (p = .799) did not. The odds of having heart disease are 7.026 times greater for males compared to females, as shown by the Exp(B) value for gender.

3. Multinomial Logistic Regression in SPSS

Multinomial logistic regression is used to predict a nominal dependent variable given one or more independent variables. It is an extension of binary logistic regression for situations where your outcome has three or more unordered categories.

Real-world examples:

Predicting preferred drink type (coffee/soft drink/tea/water) based on UK location and age
Predicting job position (junior/middle/senior management) based on employment duration, qualifications, and gender

Multinomial logistic regression works by comparing each category of the dependent variable to a reference category, producing multiple sets of logistic regression coefficients — one for each non-reference category.

3.1 Key Assumptions for Multinomial Logistic Regression

Assumption 1

Nominal dependent variable. Examples: ethnicity (Caucasian/African American/Hispanic), transport type (bus/car/tram/train). Ordinal dependent variables can technically be used, but ordinal regression is more appropriate.

Assumption 2

One or more independent variables that can be continuous, ordinal, or nominal (including dichotomous). Important: ordinal independent variables must be treated as either continuous or categorical in SPSS — they cannot be entered as ordinal.

Assumption 3

Independence of observations and mutually exclusive, exhaustive categories in the dependent variable.

Assumption 4

No multicollinearity. Multicollinearity — when two or more independent variables are highly correlated — causes problems in identifying which variable explains the outcome. This requires careful checking in SPSS, typically by running multiple linear regression procedures on any dummy variables.

Assumption 5

Linear relationship with logit. Continuous independent variables must have a linear relationship with the logit transformation of the dependent variable.

Assumption 6

No significant outliers, high leverage values, or highly influential points in the data.

3.2 Step-by-Step Procedure in SPSS

When setting up multinomial logistic regression in SPSS, you need to classify variables correctly. SPSS distinguishes between:

Covariates — continuous independent variables
Factors — nominal independent variables

Go to Analyze → Regression → Multinomial Logistic…
In the dialogue box, transfer: your dependent variable into the Dependent: box, nominal independent variables into the Factor(s): box, and continuous independent variables into the Covariate(s): box.
Click the Statistics button to open the Multinomial Logistic Regression: Statistics dialogue.

SPSS Multinomial Logistic Regression Statistics dialogue box showing checkboxes for Pseudo R-square, Cell probabilities, Classification table, Model fitting information, Goodness-of-fit, Estimates, and Likelihood ratio tests

Figures 48 & 49: Select Cell probabilities, Classification table, and Goodness-of-fit in the Statistics dialogue. These options generate the key output tables you need for interpretation.

Check the Cell probabilities, Classification table, and Goodness-of-fit checkboxes.
Click Continue, then click OK to generate the results.

3.3 Interpreting Multinomial Logistic Regression Output

Goodness-of-Fit Table

Table 23: Goodness-of-Fit table. A non-significant Pearson result (p > .05) indicates the model fits the data well.

The Goodness-of-Fit table provides two measures to assess how well the model fits the data:

Pearson chi-square: A statistically significant result (p < .05) would indicate a poor model fit. In this example, p = .341, which is not significant — the model fits the data well.
Deviance chi-square: An alternative measure that may not always agree with the Pearson statistic.

If both statistics disagree, report both and note the discrepancy. Neither measure alone is definitive — always consult additional tables such as the Model Fitting Information and Likelihood Ratio Tests.

Model Fitting Information

The Model Fitting Information table tests whether all model coefficients are zero — in other words, whether your independent variables as a group significantly improve prediction over the intercept-only baseline model.

Example result: p = .027 — the full model statistically significantly predicts the dependent variable better than the intercept-alone model.

Pseudo R-Square

SPSS reports three pseudo R² measures — Cox & Snell, Nagelkerke, and McFadden. Unlike R² in ordinary least-squares regression, none of these are easily interpretable in isolation, but they give a general sense of variance explained. The Nagelkerke R² is typically preferred for reporting.

Likelihood Ratio Tests

The Likelihood Ratio Tests table is critical for multinomial regression — it is the only table that shows the overall statistical significance of each independent variable (especially important for nominal predictors, where the Parameter Estimates table only shows individual dummy variable coefficients).

Example result: income was not statistically significant (p = .754), but tax_too_high was significant (p = .014). The model produces two sets of logistic coefficients (logits) — one for each non-reference category of the dependent variable.

4. Binary vs. Multinomial Logistic Regression: Quick Comparison

Feature	Binary Logistic Regression	Multinomial Logistic Regression
Dependent variable	Dichotomous (2 categories)	Nominal (3+ categories)
SPSS menu path	Analyze → Regression → Binary Logistic	Analyze → Regression → Multinomial Logistic
Key output tables	Model Summary, Variables in the Equation	Goodness-of-Fit, Model Fitting Information, Likelihood Ratio Tests, Parameter Estimates
Variance explained	Nagelkerke R² (preferred)	Cox & Snell, Nagelkerke, McFadden pseudo R²
Significance of predictors	Wald statistic (Sig. column)	Likelihood Ratio Tests table (overall); Parameter Estimates (individual coefficients)
Number of logistic coefficients	One set	One set per non-reference category (k−1 logits)
Multicollinearity check	Recommended	Required (Assumption 4)

5. Conclusion & Next Steps

Categorical regression analysis in SPSS is a powerful toolkit for working with real-world data where outcomes are categorical rather than continuous. Whether you’re predicting a binary event like disease presence or a multi-class outcome like political affiliation, SPSS provides structured procedures that guide you from data setup through to interpretable results.

Key takeaways from this module:

Choose your method based on your dependent variable: two categories → Binary Logistic Regression; three or more categories → Multinomial Logistic Regression.
Always verify your assumptions before running the analysis — skipping this step can produce invalid results.
For binary regression, report the Nagelkerke R² for variance explained and use Exp(B) to interpret the odds ratios.
For multinomial regression, the Likelihood Ratio Tests table is your primary tool for assessing individual predictor significance.
Categorical independent variables must always be explicitly defined in SPSS — the software does not detect them automatically.

Coming up in Module 3.2: Pilot Testing & Reliability Testing — including Cronbach’s Alpha and when to use it for assessing internal consistency in survey instruments.

This tutorial is part of the Advanced SPSS Tutorial Series (Module 3). Figures and tables are reproduced from the SPSS Tutorial Guide (Tobit Research Consulting Ltd, March 2022). Always ensure your data meets the stated assumptions before applying these techniques in practice.

Categorical Regression Analysis in SPSS

Table of Contents

Ethical AI in Academic Research: What Every Masters and PhD Student Must Know in 2026

Panel Data Analysis in Stata

info@tobitresearchconsulting.com

+254728430728

Copyright ©

Ethical AI in Academic Research: What Every Masters and PhD Student Must Know in 2026

Panel Data Analysis in Stata

info@tobitresearchconsulting.com

+254728430728

Copyright ©

Contact Us. We are ready to help you!

Let's have a chat