Panel Data
Stata
Fixed Effects
Random Effects
Quantitative Research
Postgraduate Methods
Econometrics
Panel Data Analysis in Stata: A Complete Guide to Fixed and Random Effects for Postgraduate Researchers
Tobit Research Consulting | Stata & Quantitative Methods Series | Reading time: ~18 minutes
What you will learn: What panel data is and why it matters for research, how to set up panel data in Stata using xtset, when to use fixed effects versus random effects, how to run both models using xtreg, how to perform the Hausman test to choose between them, and how to run key diagnostic tests including the Breusch-Pagan LM test, heteroskedasticity test, serial correlation test, and cross-sectional dependence test — all with practical Stata commands.
Panel data analysis is one of the most powerful tools available to postgraduate researchers in economics, finance, public policy, and the social sciences. Whether you are analysing financial performance across Kenyan commercial banks over a decade, examining how government policies affect firm-level productivity across multiple countries, or tracking household welfare outcomes over several survey rounds, panel data methods allow you to go far beyond what ordinary regression can achieve.
Yet many Masters and PhD students who collect panel data end up analysing it with simple OLS regression — losing the very advantages that make panel data worth collecting. The two most important panel data models available in Stata are the fixed effects model and the random effects model. Knowing when and how to use each one is a core skill for any quantitative researcher at the postgraduate level.
This guide, developed by Tobit Research Consulting, walks through panel data analysis in Stata from first principles. It draws from the foundational Princeton University notes by Oscar Torres-Reyna and explains the key concepts, Stata commands, model output, and diagnostic tests in clear, practical language.
1. What Is Panel Data and Why Is It Valuable?
Panel data — also referred to as longitudinal data or cross-sectional time-series data — is a dataset in which the same entities are observed across multiple time periods. Those entities might be countries, firms, individuals, counties, schools, or households. The defining feature is the combination of a cross-sectional dimension (multiple entities) and a time-series dimension (multiple periods per entity).
A typical panel dataset might look like this:
| Entity (i) |
Year (t) |
Y |
X1 |
X2 |
| Country 1 | 2000 | 6.0 | 7.8 | 5.8 |
| Country 1 | 2001 | 4.6 | 0.6 | 7.9 |
| Country 1 | 2002 | 9.4 | 2.1 | 5.4 |
| Country 2 | 2000 | 9.1 | 1.3 | 6.7 |
| Country 2 | 2001 | 8.3 | 0.9 | 6.6 |
| Country 2 | 2002 | 0.6 | 9.8 | 0.4 |
Panel data offers several important advantages over purely cross-sectional or purely time-series data:
Advantage 1
Control for unobserved heterogeneity
Panel data allows you to control for variables that are difficult or impossible to observe or measure directly — such as cultural factors, management quality, or firm-specific business practices — as long as those variables are stable over time. This is a fundamental advantage that simple cross-sectional regression cannot offer.
Advantage 2
Account for time-invariant and entity-invariant factors
Panel data lets you control for factors that vary across time but not across entities (such as national policies, global shocks, or regulatory changes), as well as factors that vary across entities but not over time.
Advantage 3
Support multilevel analysis
Panel data supports the inclusion of variables at different levels of analysis — students nested within schools, firms nested within industries, counties nested within countries — making it well-suited for hierarchical or multilevel modelling.
Important: Panel data does have limitations. Data collection can be challenging, especially for micro panels. Non-response and attrition can introduce bias. Macro panels with many countries over long periods may suffer from cross-sectional dependence. These issues should be acknowledged and tested in your methodology chapter.
2. Setting Up Panel Data in Stata: xtset
Before running any panel data model in Stata, you must declare your dataset as a panel dataset using the xtset command. This tells Stata which variable identifies the entities (the panel variable) and which variable identifies the time periods (the time variable).
xtset country year
After running this command, Stata will confirm the panel and time variables and report whether the panel is balanced. A strongly balanced panel means every entity has data for every time period. An unbalanced panel has some missing time periods for some entities. Both can be analysed using xtreg, though unbalanced panels require more attention to potential bias.
Common error: If your entity identifier (e.g., country name) is stored as a string variable rather than a numeric variable, Stata will return the error varlist: country: string variable not allowed. To resolve this, convert the variable to numeric first using: encode country, gen(country1), then re-run xtset country1 year.
3. Exploring Your Panel Data
Before fitting any model, it is good practice to visually explore the structure of your panel data. Stata provides the xtline command to plot the outcome variable across time for each panel entity.
xtline y
xtline y, overlay
These plots reveal heterogeneity across entities — variation in the levels and trends of the outcome variable between panels. Visualising this heterogeneity helps you understand whether entity-specific intercepts (fixed effects) are likely to matter, and whether the behaviour of the outcome variable differs meaningfully across panels over time.
You can also plot the mean of the outcome across entities and across years to see patterns in the data before modelling:
bysort country: egen y_mean = mean(y)
twoway scatter y country || connected y_mean country
4. The Fixed Effects Model: Theory and Stata Command
What is the Fixed Effects Model?
The fixed effects (FE) model is used when you want to analyse the impact of variables that change over time, while controlling for all stable, time-invariant characteristics of each entity. The core idea is that each entity (country, firm, individual) has its own unique, unobserved characteristics that may be correlated with the independent variables in your model. The FE model removes these entity-specific effects so that the estimated coefficient on your predictor reflects its true within-entity effect.
The mathematical model for fixed effects is:
Yit = β1Xit + αi + uit
Where Yit is the outcome for entity i at time t, Xit is the predictor variable, αi is the entity-specific intercept (the fixed effect), and uit is the idiosyncratic error term.
Key insight: The fixed effects model controls for all time-invariant differences between entities — including unobserved ones like culture, geography, or management style — so the estimated coefficients cannot be biased by these omitted characteristics. However, this also means FE cannot estimate the effect of any variable that does not change over time within entities.
When Should You Use Fixed Effects?
Use fixed effects when:
- You believe that unobserved entity characteristics are correlated with your independent variables.
- Your research question focuses on within-entity variation over time.
- You are not interested in estimating the effect of time-invariant variables.
- You want to be confident your results are not driven by unobserved entity-level confounders.
Running Fixed Effects in Stata: xtreg with fe
The primary command for fixed effects in Stata is xtreg with the fe option:
xtreg y x1, fe
xtreg y x1 x2 x3, fe
xtreg y x1 x2 x3, fe robust
Key elements of the xtreg, fe output to understand:
| Output Element |
What It Means |
Decision Rule |
| Prob > F |
Tests whether all coefficients in the model are jointly different from zero. |
If < 0.05, the model is statistically significant. |
| R-sq: within |
Variance in Y explained by X within entities over time. |
Use this R² when reporting fixed effects results. |
| R-sq: between |
Variance in Y explained by X between entities. |
Useful for context but not the main metric in FE. |
| rho |
The fraction of total variance attributable to differences across panels (intraclass correlation). |
Higher rho suggests panel effects are important. |
| corr(u_i, Xb) |
Correlation between entity-level error and predictors. |
In FE this is non-zero, reflecting correlated effects. |
| P>|t| |
Two-tailed p-value for each coefficient. |
If < 0.05, the variable significantly influences Y. |
Alternative FE Commands: areg and LSDV
Fixed effects can also be estimated using the least squares dummy variable (LSDV) approach, which adds a binary dummy variable for each entity. This produces identical coefficient estimates to xtreg, fe:
regress y x1 i.country
areg y x1, absorb(country)
The areg command is particularly useful when you have many entities, since it absorbs all the dummy variables without displaying them in the output. When reporting results, use the R² provided by regress or areg rather than the within R² from xtreg.
Adding Time Fixed Effects
You can extend the entity FE model to also include time fixed effects, controlling for shocks that affect all entities simultaneously in a given year:
xtreg y x1 i.year, fe
testparm i.year
After running the model with year dummies, use testparm i.year to test whether the time dummies are jointly equal to zero. If the resulting Prob > F is greater than 0.05, you fail to reject the null that the time effects are jointly zero, meaning time fixed effects are not needed in your model.
5. The Random Effects Model: Theory and Stata Command
What is the Random Effects Model?
The random effects (RE) model takes a different approach to entity-level heterogeneity. Instead of treating entity-specific effects as fixed constants to be estimated, the RE model treats them as random draws from a distribution — specifically, it assumes that the variation across entities is random and uncorrelated with the independent variables in the model.
The random effects model is specified as:
Yit = βXit + α + uit + εit
Here, uit is the between-entity error (capturing persistent differences across entities) and εit is the within-entity error (capturing time-varying idiosyncratic shocks). The crucial assumption is that uit is uncorrelated with the regressors Xit.
Use Fixed Effects when…
- Entity effects are correlated with predictors
- You only care about within-entity variation
- You do not need to estimate time-invariant variables
- Your entities are the full population of interest
Use Random Effects when…
- Entity effects are uncorrelated with predictors
- You need to estimate time-invariant variables (e.g. gender, sector)
- You want to generalise inferences beyond the sample
- Your entities are a random sample from a larger population
Running Random Effects in Stata: xtreg with re
xtreg y x1, re
xtreg y x1 x2 x3, re robust
Interpretation note: In the random effects model, the coefficient on each predictor represents the average effect of X on Y when X changes both across time (within entities) and between entities by one unit. This combined within-and-between interpretation makes the random effects coefficients more complex to interpret than fixed effects coefficients.
6. Fixed or Random? Running the Hausman Test
The most important practical decision in panel data analysis is whether to use fixed effects or random effects. The standard statistical test for this decision is the Hausman test. The null hypothesis of the Hausman test is that the random effects estimator is consistent — in other words, that the entity-level errors are uncorrelated with the regressors. If this null is rejected, fixed effects is the preferred model.
To run the Hausman test in Stata:
xtreg y x1, fe
estimates store fixed
xtreg y x1, re
estimates store random
hausman fixed random
| Hausman Test Result |
Decision |
Interpretation |
| Prob > chi2 < 0.05 |
Reject H₀ → Use Fixed Effects |
There is a systematic difference between the FE and RE coefficients. The entity-level errors are correlated with the regressors, violating the RE assumption. |
| Prob > chi2 ≥ 0.05 |
Fail to reject H₀ → Random Effects preferred |
No significant difference between FE and RE coefficients. RE is more efficient and its assumptions are not violated. |
Important: The Hausman test can sometimes fail to produce a positive definite test statistic, particularly in small samples or with many predictors. In such cases, alternative approaches include using robust versions of the test or making the choice based on theoretical reasoning about whether entity effects are likely to be correlated with your predictors.
7. Diagnostic Tests for Panel Data in Stata
After estimating your panel data model, several diagnostic tests should be run to check for violations of key assumptions. These tests are part of a complete and rigorous panel data analysis and are expected in a postgraduate dissertation or journal article.
Test 1
Breusch-Pagan LM Test: Testing for Random Effects vs. OLS
The LM test checks whether variance across entities is significantly different from zero. If the null hypothesis (no panel effect) is not rejected, a simple OLS regression is sufficient and random effects adds no value.
xtreg y x1, re
xttest0
If Prob > chi2 < 0.05, there are significant differences across entities and random effects (or fixed effects) is preferred over simple OLS.
Test 2
Modified Wald Test: Testing for Heteroskedasticity
The modified Wald test checks whether the variance of the error term is constant across entities in the fixed effects model. Heteroskedasticity inflates or deflates standard errors and can produce misleading inference.
ssc install xttest3
xtreg y x1, fe
xttest3
If Prob > chi2 < 0.05, heteroskedasticity is present. Add the robust option to your xtreg command to obtain heteroskedasticity-consistent standard errors.
Test 3
Wooldridge Test: Testing for Serial Correlation
Serial correlation (autocorrelation) in panel data causes standard errors to be underestimated, leading to inflated t-statistics and false significance. This test is most relevant for macro panels with long time series (more than 20 years).
ssc install xtserial
xtserial y x1
The null hypothesis is no first-order autocorrelation. If Prob > F < 0.05, serial correlation is present. In that case, use xtreg, fe cluster(entity) or xtregar to correct for it.
Test 4
Breusch-Pagan LM Test of Independence: Cross-Sectional Dependence
Cross-sectional dependence occurs when residuals across entities are correlated — for instance, when countries are economically linked and a shock in one country also affects others. This is primarily a concern in macro panels with long time series.
ssc install xttest2
xtreg y x1, fe
xttest2
If cross-sectional dependence is detected, use Driscoll-Kraay standard errors via the xtscc command (install with ssc install xtscc), which produces estimates robust to heteroskedasticity, autocorrelation, and cross-sectional dependence simultaneously.
Test 5
Pesaran CD Test: Alternative Cross-Sectional Dependence Test
ssc install xtcsd
xtreg y x1, fe
xtcsd, pesaran abs
The null hypothesis is that residuals are not correlated across entities. If Pr > 0.05, no cross-sectional dependence is detected.
Test 6
Unit Root Tests: Testing for Stationarity
In macro panels with long time series, non-stationarity (unit roots) can cause spurious regression results. Stata 11 and later include the xtunitroot command, which supports several panel unit root tests including Levin-Lin-Chu, Im-Pesaran-Shin, and the Hadri LM test.
help xtunitroot
xtunitroot llc y
8. Summary of Stata Commands for Fixed and Random Effects Models
The following table provides a complete quick reference for the major panel data model commands in Stata:
| Model Type |
Command |
Stata Syntax |
| Entity Fixed Effects |
xtreg |
xtreg y x1 x2 x3, fe |
| Entity Fixed Effects |
areg |
areg y x1 x2 x3, absorb(entity) |
| Entity Fixed Effects (LSDV) |
regress |
regress y x1 x2 x3 i.entity |
| Entity + Time Fixed Effects |
xtreg |
xtreg y x1 x2 x3 i.year, fe |
| Entity + Time Fixed Effects |
areg |
areg y x1 x2 x3 i.year, absorb(entity) |
| Random Effects |
xtreg |
xtreg y x1 x2 x3, re robust |
| Hausman Test |
hausman |
hausman fixed random |
| Time Effects Test |
testparm |
testparm i.year |
| Breusch-Pagan LM |
xttest0 |
xttest0 (after xtreg, re) |
| Heteroskedasticity |
xttest3 |
xttest3 (after xtreg, fe) |
| Serial Correlation |
xtserial |
xtserial y x1 |
| Cross-sectional Dependence |
xttest2 / xtcsd |
xttest2 or xtcsd, pesaran abs |
Workflow reminder: The standard panel data analysis workflow in Stata is: declare the panel with xtset → explore visually with xtline → run OLS for comparison → run FE with xtreg, fe → run RE with xtreg, re → run Hausman test → run diagnostics → report robust estimates if needed.
9. How Tobit Research Consulting Can Help with Your Panel Data Analysis
Panel data analysis in Stata requires a strong command of both the econometric theory behind fixed and random effects models and the practical skills to implement, interpret, and diagnose them correctly. Many postgraduate students encounter challenges at each step — from structuring a balanced panel dataset to choosing between FE and RE, interpreting the Hausman test, correcting for heteroskedasticity, and writing up the methodology chapter in a way that satisfies examiners.
Expert Stata Panel Data Analysis Support — Nairobi, Kenya
At Tobit Research Consulting, we provide end-to-end quantitative research support for Masters and PhD students across Kenya and East Africa. Our Stata and data analysis services include:
- Panel data structuring and
xtset setup in Stata
- Fixed effects and random effects model estimation and interpretation
- Hausman test, Breusch-Pagan LM test, and all panel diagnostics
- Heteroskedasticity and serial correlation correction
- Cross-sectional dependence testing and Driscoll-Kraay robust errors
- SPSS, STATA, EViews, R, and NVivo analysis support
- Chapter Three (Methodology) writing and alignment
- Chapter Four (Findings) interpretation and presentation
- Full dissertation and thesis support from proposal to defence
- Journal article data analysis and results write-up
We support students in producing rigorous, well-documented quantitative analyses that stand up to supervision, examination, and peer review.
Book a Free Consultation →
📍 Bruce House, 4th Floor, Nairobi CBD, Kenya | Tel: +254 728 430 728
This guide is part of Tobit Research Consulting’s Stata and Quantitative Methods Series. The panel data commands and test procedures described here apply to Stata 11 and later versions. For older versions of Stata, some commands may require the xi: prefix or user-written add-ons. Always consult your institutional guidelines on the preferred econometric approach for your specific research design. Core conceptual references include Torres-Reyna (Princeton University, 2007), Baltagi’s Econometric Analysis of Panel Data, and Stock & Watson’s Introduction to Econometrics.