Panel Data Stata Fixed Effects Random Effects Quantitative Research Postgraduate Methods Econometrics

Panel Data Analysis in Stata: A Complete Guide to Fixed and Random Effects for Postgraduate Researchers

Tobit Research Consulting | Stata & Quantitative Methods Series | Reading time: ~18 minutes

What you will learn: What panel data is and why it matters for research, how to set up panel data in Stata using xtset, when to use fixed effects versus random effects, how to run both models using xtreg, how to perform the Hausman test to choose between them, and how to run key diagnostic tests including the Breusch-Pagan LM test, heteroskedasticity test, serial correlation test, and cross-sectional dependence test — all with practical Stata commands.

Panel data analysis is one of the most powerful tools available to postgraduate researchers in economics, finance, public policy, and the social sciences. Whether you are analysing financial performance across Kenyan commercial banks over a decade, examining how government policies affect firm-level productivity across multiple countries, or tracking household welfare outcomes over several survey rounds, panel data methods allow you to go far beyond what ordinary regression can achieve.

Yet many Masters and PhD students who collect panel data end up analysing it with simple OLS regression — losing the very advantages that make panel data worth collecting. The two most important panel data models available in Stata are the fixed effects model and the random effects model. Knowing when and how to use each one is a core skill for any quantitative researcher at the postgraduate level.

This guide, developed by Tobit Research Consulting, walks through panel data analysis in Stata from first principles. It draws from the foundational Princeton University notes by Oscar Torres-Reyna and explains the key concepts, Stata commands, model output, and diagnostic tests in clear, practical language.

What Is Panel Data and Why Is It Valuable?
Setting Up Panel Data in Stata: xtset
Exploring Your Panel Data
The Fixed Effects Model: Theory and Stata Command
The Random Effects Model: Theory and Stata Command
Fixed or Random? Running the Hausman Test
Diagnostic Tests for Panel Data in Stata
Summary of Stata Commands for FE and RE Models
How Tobit Research Consulting Can Help

1. What Is Panel Data and Why Is It Valuable?

Panel data — also referred to as longitudinal data or cross-sectional time-series data — is a dataset in which the same entities are observed across multiple time periods. Those entities might be countries, firms, individuals, counties, schools, or households. The defining feature is the combination of a cross-sectional dimension (multiple entities) and a time-series dimension (multiple periods per entity).

A typical panel dataset might look like this:

Entity (i)	Year (t)	Y	X1	X2
Country 1	2000	6.0	7.8	5.8
Country 1	2001	4.6	0.6	7.9
Country 1	2002	9.4	2.1	5.4
Country 2	2000	9.1	1.3	6.7
Country 2	2001	8.3	0.9	6.6
Country 2	2002	0.6	9.8	0.4

Panel data offers several important advantages over purely cross-sectional or purely time-series data:

Advantage 1

Control for unobserved heterogeneity

Panel data allows you to control for variables that are difficult or impossible to observe or measure directly — such as cultural factors, management quality, or firm-specific business practices — as long as those variables are stable over time. This is a fundamental advantage that simple cross-sectional regression cannot offer.

Advantage 2

Account for time-invariant and entity-invariant factors

Panel data lets you control for factors that vary across time but not across entities (such as national policies, global shocks, or regulatory changes), as well as factors that vary across entities but not over time.

Advantage 3

Support multilevel analysis

Panel data supports the inclusion of variables at different levels of analysis — students nested within schools, firms nested within industries, counties nested within countries — making it well-suited for hierarchical or multilevel modelling.

Important: Panel data does have limitations. Data collection can be challenging, especially for micro panels. Non-response and attrition can introduce bias. Macro panels with many countries over long periods may suffer from cross-sectional dependence. These issues should be acknowledged and tested in your methodology chapter.

2. Setting Up Panel Data in Stata: xtset

Before running any panel data model in Stata, you must declare your dataset as a panel dataset using the xtset command. This tells Stata which variable identifies the entities (the panel variable) and which variable identifies the time periods (the time variable).

    * Declare the panel structure

    xtset country year

After running this command, Stata will confirm the panel and time variables and report whether the panel is balanced. A strongly balanced panel means every entity has data for every time period. An unbalanced panel has some missing time periods for some entities. Both can be analysed using xtreg, though unbalanced panels require more attention to potential bias.

Common error: If your entity identifier (e.g., country name) is stored as a string variable rather than a numeric variable, Stata will return the error varlist: country: string variable not allowed. To resolve this, convert the variable to numeric first using: encode country, gen(country1), then re-run xtset country1 year.

3. Exploring Your Panel Data

Before fitting any model, it is good practice to visually explore the structure of your panel data. Stata provides the xtline command to plot the outcome variable across time for each panel entity.

    * Plot each entity’s time trend separately

    xtline y

    * Plot all entities overlaid on one graph

    xtline y, overlay

These plots reveal heterogeneity across entities — variation in the levels and trends of the outcome variable between panels. Visualising this heterogeneity helps you understand whether entity-specific intercepts (fixed effects) are likely to matter, and whether the behaviour of the outcome variable differs meaningfully across panels over time.

You can also plot the mean of the outcome across entities and across years to see patterns in the data before modelling:

    * Generate entity means and plot

    bysort country: egen y_mean = mean(y)

    twoway scatter y country || connected y_mean country

4. The Fixed Effects Model: Theory and Stata Command

What is the Fixed Effects Model?

The fixed effects (FE) model is used when you want to analyse the impact of variables that change over time, while controlling for all stable, time-invariant characteristics of each entity. The core idea is that each entity (country, firm, individual) has its own unique, unobserved characteristics that may be correlated with the independent variables in your model. The FE model removes these entity-specific effects so that the estimated coefficient on your predictor reflects its true within-entity effect.

The mathematical model for fixed effects is:

Y_it = β₁X_it + α_i + u_it

Where Y_it is the outcome for entity i at time t, X_it is the predictor variable, α_i is the entity-specific intercept (the fixed effect), and u_it is the idiosyncratic error term.

Key insight: The fixed effects model controls for all time-invariant differences between entities — including unobserved ones like culture, geography, or management style — so the estimated coefficients cannot be biased by these omitted characteristics. However, this also means FE cannot estimate the effect of any variable that does not change over time within entities.

When Should You Use Fixed Effects?

Use fixed effects when:

You believe that unobserved entity characteristics are correlated with your independent variables.
Your research question focuses on within-entity variation over time.
You are not interested in estimating the effect of time-invariant variables.
You want to be confident your results are not driven by unobserved entity-level confounders.

Running Fixed Effects in Stata: xtreg with fe

The primary command for fixed effects in Stata is xtreg with the fe option:

    * Basic fixed effects model

    xtreg y x1, fe

    * Fixed effects with multiple predictors

    xtreg y x1 x2 x3, fe

    * Fixed effects with robust standard errors

    xtreg y x1 x2 x3, fe robust

Key elements of the xtreg, fe output to understand:

Output Element	What It Means	Decision Rule
Prob > F	Tests whether all coefficients in the model are jointly different from zero.	If < 0.05, the model is statistically significant.
R-sq: within	Variance in Y explained by X within entities over time.	Use this R² when reporting fixed effects results.
R-sq: between	Variance in Y explained by X between entities.	Useful for context but not the main metric in FE.
rho	The fraction of total variance attributable to differences across panels (intraclass correlation).	Higher rho suggests panel effects are important.
corr(u_i, Xb)	Correlation between entity-level error and predictors.	In FE this is non-zero, reflecting correlated effects.
P>\|t\|	Two-tailed p-value for each coefficient.	If < 0.05, the variable significantly influences Y.

Alternative FE Commands: areg and LSDV

Fixed effects can also be estimated using the least squares dummy variable (LSDV) approach, which adds a binary dummy variable for each entity. This produces identical coefficient estimates to xtreg, fe:

    * LSDV approach (Stata 11+)

    regress y x1 i.country

    * Using areg (absorbs entity dummies)

    areg y x1, absorb(country)

The areg command is particularly useful when you have many entities, since it absorbs all the dummy variables without displaying them in the output. When reporting results, use the R² provided by regress or areg rather than the within R² from xtreg.

Adding Time Fixed Effects

You can extend the entity FE model to also include time fixed effects, controlling for shocks that affect all entities simultaneously in a given year:

    * Entity and time fixed effects

    xtreg y x1 i.year, fe

    * Test whether time dummies are jointly significant

    testparm i.year

After running the model with year dummies, use testparm i.year to test whether the time dummies are jointly equal to zero. If the resulting Prob > F is greater than 0.05, you fail to reject the null that the time effects are jointly zero, meaning time fixed effects are not needed in your model.

5. The Random Effects Model: Theory and Stata Command

What is the Random Effects Model?

The random effects (RE) model takes a different approach to entity-level heterogeneity. Instead of treating entity-specific effects as fixed constants to be estimated, the RE model treats them as random draws from a distribution — specifically, it assumes that the variation across entities is random and uncorrelated with the independent variables in the model.

The random effects model is specified as:

Y_it = βX_it + α + u_it + ε_it

Here, u_it is the between-entity error (capturing persistent differences across entities) and ε_it is the within-entity error (capturing time-varying idiosyncratic shocks). The crucial assumption is that u_it is uncorrelated with the regressors X_it.

Use Fixed Effects when…

Entity effects are correlated with predictors
You only care about within-entity variation
You do not need to estimate time-invariant variables
Your entities are the full population of interest

Use Random Effects when…

Entity effects are uncorrelated with predictors
You need to estimate time-invariant variables (e.g. gender, sector)
You want to generalise inferences beyond the sample
Your entities are a random sample from a larger population

Running Random Effects in Stata: xtreg with re

    * Basic random effects model

    xtreg y x1, re

    * Random effects with robust standard errors

    xtreg y x1 x2 x3, re robust

Interpretation note: In the random effects model, the coefficient on each predictor represents the average effect of X on Y when X changes both across time (within entities) and between entities by one unit. This combined within-and-between interpretation makes the random effects coefficients more complex to interpret than fixed effects coefficients.

6. Fixed or Random? Running the Hausman Test

The most important practical decision in panel data analysis is whether to use fixed effects or random effects. The standard statistical test for this decision is the Hausman test. The null hypothesis of the Hausman test is that the random effects estimator is consistent — in other words, that the entity-level errors are uncorrelated with the regressors. If this null is rejected, fixed effects is the preferred model.

To run the Hausman test in Stata:

    * Step 1: Run fixed effects and store estimates

    xtreg y x1, fe

    estimates store fixed

    * Step 2: Run random effects and store estimates

    xtreg y x1, re

    estimates store random

    * Step 3: Run the Hausman test

    hausman fixed random

Hausman Test Result	Decision	Interpretation
Prob > chi2 < 0.05	Reject H₀ → Use Fixed Effects	There is a systematic difference between the FE and RE coefficients. The entity-level errors are correlated with the regressors, violating the RE assumption.
Prob > chi2 ≥ 0.05	Fail to reject H₀ → Random Effects preferred	No significant difference between FE and RE coefficients. RE is more efficient and its assumptions are not violated.

Important: The Hausman test can sometimes fail to produce a positive definite test statistic, particularly in small samples or with many predictors. In such cases, alternative approaches include using robust versions of the test or making the choice based on theoretical reasoning about whether entity effects are likely to be correlated with your predictors.

7. Diagnostic Tests for Panel Data in Stata

After estimating your panel data model, several diagnostic tests should be run to check for violations of key assumptions. These tests are part of a complete and rigorous panel data analysis and are expected in a postgraduate dissertation or journal article.

Test 1

Breusch-Pagan LM Test: Testing for Random Effects vs. OLS

The LM test checks whether variance across entities is significantly different from zero. If the null hypothesis (no panel effect) is not rejected, a simple OLS regression is sufficient and random effects adds no value.

      xtreg y x1, re

      xttest0

If Prob > chi2 < 0.05, there are significant differences across entities and random effects (or fixed effects) is preferred over simple OLS.

Test 2

Modified Wald Test: Testing for Heteroskedasticity

The modified Wald test checks whether the variance of the error term is constant across entities in the fixed effects model. Heteroskedasticity inflates or deflates standard errors and can produce misleading inference.

      * Install if not already installed

      ssc install xttest3

      xtreg y x1, fe

      xttest3

If Prob > chi2 < 0.05, heteroskedasticity is present. Add the robust option to your xtreg command to obtain heteroskedasticity-consistent standard errors.

Test 3

Wooldridge Test: Testing for Serial Correlation

Serial correlation (autocorrelation) in panel data causes standard errors to be underestimated, leading to inflated t-statistics and false significance. This test is most relevant for macro panels with long time series (more than 20 years).

      * Install if not already installed

      ssc install xtserial

      xtserial y x1

The null hypothesis is no first-order autocorrelation. If Prob > F < 0.05, serial correlation is present. In that case, use xtreg, fe cluster(entity) or xtregar to correct for it.

Test 4

Breusch-Pagan LM Test of Independence: Cross-Sectional Dependence

Cross-sectional dependence occurs when residuals across entities are correlated — for instance, when countries are economically linked and a shock in one country also affects others. This is primarily a concern in macro panels with long time series.

      * Install if not already installed

      ssc install xttest2

      xtreg y x1, fe

      xttest2

If cross-sectional dependence is detected, use Driscoll-Kraay standard errors via the xtscc command (install with ssc install xtscc), which produces estimates robust to heteroskedasticity, autocorrelation, and cross-sectional dependence simultaneously.

Test 5

Pesaran CD Test: Alternative Cross-Sectional Dependence Test

      * Install if not already installed

      ssc install xtcsd

      xtreg y x1, fe

      xtcsd, pesaran abs

The null hypothesis is that residuals are not correlated across entities. If Pr > 0.05, no cross-sectional dependence is detected.

Test 6

Unit Root Tests: Testing for Stationarity

In macro panels with long time series, non-stationarity (unit roots) can cause spurious regression results. Stata 11 and later include the xtunitroot command, which supports several panel unit root tests including Levin-Lin-Chu, Im-Pesaran-Shin, and the Hadri LM test.

      * Check help for available test options

      help xtunitroot

      * Example: Levin-Lin-Chu test

      xtunitroot llc y

8. Summary of Stata Commands for Fixed and Random Effects Models

The following table provides a complete quick reference for the major panel data model commands in Stata:

Model Type	Command	Stata Syntax
Entity Fixed Effects	xtreg	`xtreg y x1 x2 x3, fe`
Entity Fixed Effects	areg	`areg y x1 x2 x3, absorb(entity)`
Entity Fixed Effects (LSDV)	regress	`regress y x1 x2 x3 i.entity`
Entity + Time Fixed Effects	xtreg	`xtreg y x1 x2 x3 i.year, fe`
Entity + Time Fixed Effects	areg	`areg y x1 x2 x3 i.year, absorb(entity)`
Random Effects	xtreg	`xtreg y x1 x2 x3, re robust`
Hausman Test	hausman	`hausman fixed random`
Time Effects Test	testparm	`testparm i.year`
Breusch-Pagan LM	xttest0	`xttest0` (after xtreg, re)
Heteroskedasticity	xttest3	`xttest3` (after xtreg, fe)
Serial Correlation	xtserial	`xtserial y x1`
Cross-sectional Dependence	xttest2 / xtcsd	`xttest2` or `xtcsd, pesaran abs`

Workflow reminder: The standard panel data analysis workflow in Stata is: declare the panel with xtset → explore visually with xtline → run OLS for comparison → run FE with xtreg, fe → run RE with xtreg, re → run Hausman test → run diagnostics → report robust estimates if needed.

9. How Tobit Research Consulting Can Help with Your Panel Data Analysis

Panel data analysis in Stata requires a strong command of both the econometric theory behind fixed and random effects models and the practical skills to implement, interpret, and diagnose them correctly. Many postgraduate students encounter challenges at each step — from structuring a balanced panel dataset to choosing between FE and RE, interpreting the Hausman test, correcting for heteroskedasticity, and writing up the methodology chapter in a way that satisfies examiners.

Expert Stata Panel Data Analysis Support — Nairobi, Kenya

At Tobit Research Consulting, we provide end-to-end quantitative research support for Masters and PhD students across Kenya and East Africa. Our Stata and data analysis services include:

Panel data structuring and xtset setup in Stata
Fixed effects and random effects model estimation and interpretation
Hausman test, Breusch-Pagan LM test, and all panel diagnostics
Heteroskedasticity and serial correlation correction
Cross-sectional dependence testing and Driscoll-Kraay robust errors
SPSS, STATA, EViews, R, and NVivo analysis support
Chapter Three (Methodology) writing and alignment
Chapter Four (Findings) interpretation and presentation
Full dissertation and thesis support from proposal to defence
Journal article data analysis and results write-up

We support students in producing rigorous, well-documented quantitative analyses that stand up to supervision, examination, and peer review.

Book a Free Consultation →

📍 Bruce House, 4th Floor, Nairobi CBD, Kenya | Tel: +254 728 430 728

This guide is part of Tobit Research Consulting’s Stata and Quantitative Methods Series. The panel data commands and test procedures described here apply to Stata 11 and later versions. For older versions of Stata, some commands may require the xi: prefix or user-written add-ons. Always consult your institutional guidelines on the preferred econometric approach for your specific research design. Core conceptual references include Torres-Reyna (Princeton University, 2007), Baltagi’s Econometric Analysis of Panel Data, and Stock & Watson’s Introduction to Econometrics.